PREDICTORS  OF  SUCCESS  IN 
AIR  FORCE  INSTITUTE  OF  TECHNOLOGY 
RESIDENT  MASTER'S  DEGREE  PROGRAMS: 

A  VALIDITY  STUDY 

Jam#*  R.  Van  Scott#r,  Captain,  USAF 
LSSR  4-83 


9  %*  \  • 
/  .1  • 

,v.v. 


1  for  ;-v.' 


:r^.-oOc!c,rl-.vAi::l 

*  V,  '  • 


.*»  ;.'V 


The  contones  of  the  document  are  technically  accurate ,  and 
no  sensitive  items,  detrimental  ideas,  or  deleterious 
information  are  contained  therein.  Furthermore,  the  views 
expressed  in  the  document  are  those  of  the  author (s)  and  do 
not  necessarily  reflect  the  views  of  the  School  of  Systems 
and  Logistics,  the  Air  University,  the  Air  Training  Command, 
the  United  States-  Air  Force,  or  the  Department  of  Defense,  j 


*  / 


UNCLASSIFIED 


SECURITY  CLASSIFICATION  OF  this  PAGE  (W Ian  Data  Bnt.r.d) 


j  REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

1.  RECIPIENT’S  CATALOG  NUMBER 

4.  TITU t  SvMUIt*) 

PREDICTORS  OF  SUCCESS  IN 

AIR  FORCE  INSTITUTE  OF  TECHNOLOGY 

S.  TYPE  OF  REPORT  A  PERIOD  COVEREO 

Master's  Thesis 

RESIDENT  MASTER'S  DEGREE  PROGRAMS: 

A  VALIDITY  STUDY 

t.  PERFORMING  org.  report  number 

7.  auThORIaI 

James  R.  Van  Scotter,  Captain,  USAF 

a.  contract  or  grant  numbers; 

S.  PERFORMING  ORGANIZATION  NAME  AND  AOORESS 

School  of  Systems  and  Logistics 

Air  Force  Institute  of  Technology 

WPAFB ,  OH 

10.  program  (element,  project,  task 

AREA  *  PORK  UNIT  MUM  1C RS 

tl.  CONTROLLING  OFFICE  NAME  AnO  AOORESS 

Department  of  Communication 

AFIT/LSH,  WPAFB,  OH  45433 

12.  report  oats 

September  1983 

11.  number  of  pages 

118 

14.  MONITORING  AGENCY  NAME  A  AOORESSCK  dilltrm *(  Inm  Cantnlllnt  0/1/ ca) 

IS.  security  Class,  ( *1  Hila  rapoarj 

UNCLASSIFIED 

' 

hwthei  nun.m.ni=H'.'rpm=iEn— 

-  -  in 

■  «.  01  ST  Rl  BU  Tl  ON  STATEMENT  fol  Itli,  Mtpor  1) 

Approved  for  public  release;  distribution  unlimited. 

17.  OIITRltUTlOM  IT ATEMCmT  (ot  tko  a4«(r«c(  on toted  in  Block  20,  It  dllterent  from  Report) 

IE.  SUPPLEMENTARY  notes  , _ _  , 

lth*  twoumn' 

Iw  «N«rt  oad  PMaa^awl  D~vafapaaaJ 
*•**•  ^AttlulA  ai  TaekcaiopY  (ATC) 

Wright* 7ottar»«a  A/l  OH  45433 

1 5SEP1983 

It.  KlY  WOROI  (Continue  on  teeeree  tide  it  neceeeery  «>>4  identity  by  block  number) 

Selection 

Officer  personnel 

Psychological  tests 

Schools 

_ Psychological  measurement _ 

10.  AISTRACT  (Continue  on  reeerao  aide  It  noeoeeory  end  Identity  by  block  number) 

Thesis  Chairman:  Guy  Shane,  PhD 

DD  I  l*M  tl  1473  EDITION  OF  I  NOV  »S  IS  ORW  FT-T 


UNCLASSIFIED 


SCCUWTY  CLAJSiriCATtON  OW  THI*  »AGC/Wh«t  Cm  Bntm rt4) 


The  effectiveness  of  a  selection  process  is  directly  related  to  the 
strength  of  the  relationships  between  eligibility  criteria  and  meas- 
ures  of  performance.  Criterion-related  validity  research  provides 
a  means  to  establish  the  nature  and  strength  of  predictor/criterion 
relationships.  Predictors  used  by  the  Air  Force  Institute  of  Tech¬ 
nology  (AFIT)  in  selecting  military  officers  and  civilian  employees 
for  graduate  school  include  admission  test  scores  and  undergraduate 
grade-point-averages.  Using  a  sample  containing  2170  cases  which 
spanned  the  period  from  1977  to  1982,  predictor/criterion  relation¬ 
ships  were  demonstrated  for  AFIT's  in-residence  master's  degree  pro¬ 
grams.  The  differences  in  key  predictor/criterion  relationships  be¬ 
tween  17  graduate  programs  were  statistically  tested.  As  a  result, 
it  was  found  that  some  master's  degree  programs  could  be  statisti¬ 
cally  combined  with  others  to  enhance  prediction.  Prediction  models 
were  developed  using  multiple  regression.  These  models  were  shown 
to  be  superior  to  the  present  selection  process.  Correlation 
matrices,  program  groups,  and  prediction  models  are  presented. 


r?Mr»T  a  rrf  rf  rn 


LSSR  4-83 


PREDICTORS  OF  SUCCESS  IN 
AIR  FORCE  INSTITUTE  OF  TECHNOLOGY 
RESIDENT  MASTER'S  DEGREE  PROGRAMS: 
A  VALIDITY  STUDY 


A  Thesis 

Presented  to  th*  Faculty  of  th*  School  of  Systems  and  Logistics 
of  th*  Air  Fore*  Institute  of  Technology 
Air  Universi ty 

In  Partial  Fulfillment  of  the  Requirements  for  th* 

Oegree  of  Master  of  Science  in  Logistics  Management 


By 

James  R.  Van  Scotter,  3S 
Captain,  USAF 

September  1983 


Approved  for  public  release; 
distribution  unlimited 


This  thesis,  written  by 


Captain  James  R.  Van  Scotter 

has  been  accepted  by  the  undersigned  on  behalf  of  the 
faculty  of  the  School  of  Systems  and  Logistics  in  partial 
fulfillment  of  the  requirements  for  the  degree  of 

MASTER  OF  SCIENCE  IN  LOGISTICS  MANAGEMENT 

DATE:  28  September  1983 


READER 


TABLE  OF  CONTENTS 


Pag* 

LIST  OF  TABLES . . .  v 

CHAPTER 

I.  INTRODUCTION .  1 

Problem  statement  .....  .  7 

Background.  .........  .  8 

Reliability.. . 13 

Validity . 13 

Analytical  methods . 17 

Predictors . 20 

Selection  based  on  cut-off  scores  .  .  23 

Other  approaches . 26 

Criterion  problems.  .  . . 30 

Departmental  differences.  .  . . 31 

The  selection  ratio  . . 33 

Summary  . . 35 

Research  hypotheses . 37 

1 1  .  METHODS  . . 38 

Subjects . 38 

Variable  definitions.  .  38 

Data  analysis  . . .  .  3? 

Developing  prediction  models.  .  .  .  42 


CHAPTER  Pag* 

III.  RESULTS  .  ..............  44 

Validity  of  th*  predictors . .  .  44 

Comparing  the  correl a t i ons .  .........  46 

Predi ctor/cr i ter i on  correlations 

for  program  groups.  ......  .  47 

Description  of  present 

admissions  procedures  .  53 

Determining  the  procedure's 

•  validity.  .......  .  ....  57 

Best  prediction  models . 61 

Economic  analysis  ..............  68 

IV.  DISCUSSION  AND  CONCLUSIONS . 70 

A  review  of  th*  hypotheses . .  70 

Discussion.  .  . . 72 

Other  findings.  . . 73 

Conclusions  . . 74 

APPENDICES . 77 

A.  DEMOGRAPHIC  INFORMATION  .  ...  77 

B.  CORRELATION  TEST  MATRICES . .  8? 

C.  CORRELATION  MATRICES.  . . 94 

D.  METHOD  USED  TO  ESTIMATE  THE  VALIDTY  OF 

CURRENT  AFIT  SELECTION  PROCEDURES  ......  98 

E.  TAYLOR-RUSSELL  TABLES  .  102 

SELECTED  BIBLIOGRAPHY  .  104 

A.  REFERENCES  CITED . 105 

B.  RELATED  SOURCES . 110 


/ 


LIST  OF  TABLES 


Tabl  •  Pag* 

1  Correlations  o-f  predictors 

with  6GPA  (entire  sample) . .  43 

2  Correlations  o-f  predictors 

with  6GPA  (Group  41) . 48 

i  i 

3  Correlations  o-f  predictors 

with  GGPA  (Group  42) . 49 

4  Correlations  o-f  predictors 

with  GGPA  (Group  43) . 30 

3  Correlations  o-f  predictors 

with  GGPA  (Group  44) . 31 

6  Correlations  o-f  predictors 

with  GGPA  (Group  43) . 32 

7  Multiple  regression  equation 

(entire  sample  using  cases 

wi  th  both  GRE  and  GMAT) . 62 


8  Multiple  regression  equation 

(entire  sample  using  cases  with  GMAT)  ....  62 


9  Multiple  regression  equation 

(entire  sample  using  cases  with  GRE).  ....  63 

10  Multiple  regression  equation 

(Group  41).  .  . . 63 

11  Multiple  regression  equation 

(Group  42) . 64 

12  Multiple  regression  equation 

(Group  43) . 63 

13  Multiple  regression  equation 

(Group  44) . 66 

14  Multiple  regression  equation 

(Group  43) . 66 


v 


CHAPTER  I 


INTRODUCTION 

The  United  States  Air  Force  has  been  a  leader  in 
the  design,  development,  a.  management  of  new  technologies 
throughout  its  history.  The  evolution  of  military  air 
power  has  been  paralleled  by  a  growth  in  the  size  and 
complexity  of  the  Air  Force.  By  consistently  exploiting 
the  military  applications  of  technological  advances,  the 
Air  Force  has  been  able  to  counter  the  threats  posed  by 
potential  adversaries  and  increase  its  own  capability  to 
support  the  national  objectives  of  the  United  States.  In 
meeting  this  challenge,  the  Air  Force  has  made  a  strong 
committment  to  advanced  technical  and  management  education. 
The  Air  Force  Institute  of  Technology  <AFIT>  is  a  concrete 
example  of  that  committment.  Through  AFIT's  programs, 
military  officers  and  civilian  employees  of  the  Department 
of  Defense  are  sponsored  in  both  undergraduate  and  graduate 
level  programs  in  management,  engineering,  and  other 
related  disciplines. 

Although  AFIT  aoministers  a  wide  range  of 
educational  programs,  its  in-residence  master's  degree 
programs  are  of  particular  interest. 


1 


These  programs  art  designed  to  give  sal  ac tad 
officers  tha  ability  to  anafyza  and  solve  complex 
tachnical  and  managarial  problems  faced  by  tha  Air 
Force  and  tha  Department  of  Defense.  (Uni  tad 
Statas  Air  Forca  Manual  30-3,  Volume  I ,  para  4-9 
<a>,  1981) 

This  ur  qua  emphasis  provides  studants  with  many  of  tha 
skills  nacassary  tor  successful  performance  at  tha  highar 
levels  of  tha  Air  Forca  organization  and  opans  caraar 
opportunities  -for  tham.  Thus,  both  tha  individual  studants 
and  tha  Air  Forca  bana-fit. 

Class  siza  limitations,  and  tha  naad  to  maximiza 
tha  raturn  on  its  invastmant  raquira  tha  Air  Forca  to 
amptoy  a  selective  admissions  policy.  Only  those  officers 
whose  academic  abilities,  motivation,  and  job  performance 
indicate  a  high  probability  of  success  are  admitted  to 
AFIT' s  master's  degree  programs.  To  differentiate  between 
those  studants  who  are  likely  to  succeed  and  those  who  are 
not,  AFIT  has  established  basic  eligibility  criteria. 
Minimum  requirements  vary  somewhat  from  program  to  program 
but,  in  general,  AFIT  requires  an  undergraduate 

grada-poi nt-avaraga  (UGPA)  of  2.3  or  batter  (on  a  4.0 
scale)  and  Graduate  Record  Examinations  (GRE)  Verbal,  and 
Quantitative  scores  totalling  1000,  or  a  Graduate 

Management  Admissions  Test  (GMAT)  score  of  300  or  batter 
(Air  Forca  Institute  of  Technology,  1982). 

Undergraduate  grada-poi n t-avaragas  have  bean  widely 
used  as  eligibility  criteria  in  graduate  and  professional 


schools.  However,  in  recent  years,  grade  in-flat  ion,  a  wide 
range  o-f  grading  practices,  and  the  advent  o-f 
non-tradi t i onal  degree  programs  has  made  this  index 
increasingly  di-f-ficult  to  interpret  (Knapp  and  Hamilton, 
1978).  As  a  result,  the  use  o-f  other  indicators,  and 
especially  the  use  o-f  standardized  tests  such  as  the  GRE 
and  GMAT,  has  become  more  and  more  important. 

Pro-fess  i  onal  1  y  prepared  standardized  tests  can 
provide  valuable  in-formation  about  the  skills  and  aptitudes 
o-f  potential  graduate  students.  In-formation  about  the 
distribution  o-f  test  scores  -for  recent  examinees  is  made 
available  to  graduate  schools  by  test  publishers.  This 
enables  graduate  school  admissions  departments  to  evaluate 
students  from  a  wide  variety  of  backgrounds  on  a  common 
scale,  and  to  compare  the  performance  of  each  individual 
with  national  averages. 

When  test  scores  are  used  to  differentiate  between 
applicants,  an  underlying  assumption  is  that  they  measure 
attributes  that  are  strongly  related  to  academic 
performance.  Another  assumption  is  that  scores  occur  in  a 
continuum  and  can  be  ranked.  Following  this  logic,  it  is 
further  assumed  that  high  scores  indicate  high  potential, 
average  scores  indicate  average  potential,  and  low  scores 
1 ow  poten  t i al . 


3 


4 


These  assumption*  may  or  may  not  be  valid.  The 
degree  to  which  they  are  valid  in  predicting  success  in  a 
specific  situation  is  a  critical  concern.  Many  factors  can 
influence  the  accuracy  of  such  predictions,  but  they  can  be 
reduced  to  the  test  itself,  individual  differences,  and 
differences  in  how  “success*  is  defined. 


A  test  cannot  be  valid  in  general;  it  is  valid 
for  a  purpose.  Indeed,  a  test  may  be  both  valid 
and  invalid.  For  example,  skill  in  algebra  may  be 
a  valid  predictor  of  science  and  math  grades,  but 
it  may  not  be  valid  for  history  or  english.  (Gr.en, 
1981) 

Green's  point  is  well  taken.  A  test  must  measure  pertinent 
skills  if  it  is  to  be  useful.  The  relationship  between  a 
predictor  and  a  criterion  (measure  of  success)  is  expressed 
in  terms  of  the  predictor's  cr i ter i on-rel ated  validity. 
Test  users  must  have  evidence  of  the  cr i ter i on-rel ated 
validity  of  the  test  to  insure  that  their  decisions  are 
based  on  relevant  information. 

There  is  also  another  issue.  Those  involved  with 
the  use  of  tests  for  selection  must  be  aware  of  ethical 
considerations.  In  an  effort  to  establish  some  ethical 
guidelines  for  test  users  and  publishers,  the  American 
Psychological  Association  published  a  handbook  entitled: 

ucational  an 


Manual s  <1946).  Its  purpose  is  to  provide  a  common 
framework  for  evaluating  tests  and  test  materials.  The 
reason  for  its  develooment  follows: 


4 


Almost  any  test  can  be  useful  -for  some 
-functions  and  in  some  situations,  but  even  the  best 
test  can  have  damaging  consequences  i  -f  used 
i  nappropr  i  ate!  y .  There-fore,  primary  responsi  bi  1  i  ty 
-for  the  improvement  o-f  testing  rests  on  the 
shoulders  o-f  test  users .  <Amer  i  can  Psychological 
Association,  1966,  p.6> 

This  general  admonition  was  followed  with  a  more  specific 

t  i 

discussion  of  the  cr i ter i on-re  1 ated  validity  issue. 

Local  collection  of  evidence  on 

cr i ter i on-rel ated  validity  is  frequently  more 
useful  than  published  data  .  .  .  In  cases  where 

criteria  differ  from  one  locality  to  another  or 
from  one  institution  to  another,  no  published  data 
can  serve  all  localities.  For  example,  the 
validity  of  a  certain  test  for  predicting  grades  at 
a  college  with  a  unique  kind  of  curiculum  may  be 
quite  different  from  the  published  validity  of  the 
same  test  that  was  based  on  more  conventional 
colleges.  (American  Psychological  Association, 

1966 ,  p . 1 8) 

Of  course  it  can  be  argued  that  until  the  cr i ter i on-rel ated 
validity  of  a  test  has  been  demonstrated  for  a  specific 
purpose,  a  user  cannot  ethically  rely  on  published  data. 
This  type  of  argument,  along  with  practical  concerns  about 
the  effectiveness  of  the  GRE  and  GMAT  as  predictors,  has 
led  many  graduate  schools  to  sponsor  local  validity 

studies.  The  Educational  Testing  Service  (ETS) ,  developer 
and  publ i sher  of  the  GMAT  and  GRE  has  been  even  more 
specific  in  its  recommendations. 

It  is  incumbent  on  any  institution  using  GMAT 
scores  jn  the  admissions  process  that  it 
demonstrate  empirically  the  relationship  between 
test  scores  and  measures  of  performance  in  its 
academic  program.  (Graduate  Management  Admissions 
Counc i 1 ,  1982) 


3 


All  parties  to  the  development  o-f  the  Graduate 
Record  Examinations  <3RE)  Program,  -from  the  outset, 
have  recognized  the  need  -for  empirical  evidence 
regarding  the  predictive  validity  o-f  the  GRE  tests 
and  other  preadmi  ssi  ons  variables.  (Wilson,  1 97?) 

Independent  researchers  have  reached  simitar  conclusions: 

Each  pro-fessi  onal  school  should  carry  on 
continual  research  on  the  e-f-fect  i  veness  o-f  its 
selection  procedures  and  various  other  aspects  o-f 
its  total  program.  Selection  procedures  need  to  be 
empirically  validated,  since  one  cannot  assume  that 
they  will  be  effective  in  one  situation  if  they 
have  been  so  in  others.  <Furst,  1950) 

Practical  consideration  are  very  important.  The 
value  of  any  selection  instrument  is  directly  related  to 
the  savings  it  provides  the  organization.  According  to 
AFIT' s  096CR  financial  report  for  1981,  the  average  costs 
for  sponsoring  graduate  students  in  the  Engineering  and 
Logistics  Management  Schools  were  *82,892.68  and  *67,258.66 
respectively  (Air  Force  Institute  of  Technology,  1981). 
There  are  many  ways  to  view  this  investment,  but  no  matter 
what  your  perspective  is,  it  is  reasonable  to  assume  that 
the  Air  Force  gains  more  from  graduates  than  non-graduates . 
If  you  view  non-graduation  as  a  total  loss,  it  is  evident 
that  even  a  small  improvement  in  the  selection  procedure 
could  result  in  significant  savings. 

When  tests  are  used  in  the  selection  process, 
it  should  be  on  the  basis  of  demonstrated 
improvement  in  the  selection  with  the  test  score 
over  selection  without  the  test  score.  (Uomer, 
1968,  p .57) 


Local  validity  studies  can  -furnish  this  in-formation  and  may 
also  point  out  better  ways  of  combining  the  various 
predictors.  A  tailor-made  prediction  model  can  be 
developed  for  a  particular  situation. 

Admissions  eligibility  criteria  can  become 
outmoded.  If  this  happens  the  efficiency  or  the  selection 
process  can  be  seriously  affected. 

A  test  with  significant  cr i ter i on-re  1 ated 
validity  five  or  ten  years  ago  may  not  have  the 
same  relationship  today.  This  will  be 
particulari 1y  true  whenever  there  is  any  change  in 
the  criterion.  A  college  that  becomes  more 
selective  over  a  period  of  years  may  change  its 
grading  practices  enough  to  alter  the  predictive 
validity  of  a  college  aptitude  test.  (Womer,  1968, 

p .  61 ) 


Problem  statement 

GRE  and  GMAT  test  scores  are  used  in  determining 
the  eligibility  of  candidates  for  AFIT's  resident  master's 
degree  programs  in  the  School  of  Systems  and  Logistics  and 
in  the  School  of  Engineering.  While  other  factors  are  also 
considered,  GRE  and  GMAT  test  scores  are  heavily  weighted 
by  the  AFIT  Registrar's  staff. 

Simply  stated,  the  problem  is  that  the 
relationships  between  the  various  indicators  of  student 
potential  and  academic  success  in  these  programs  have  not 
been  demonstrated.  More  specifically,  the  validity  of  the 
GRE  and  GMAT  as  predictors  of  academic  performance  for  most 


7 


of  thasa  programs  has  not  baan  astablishad,  nor  has  tha 
validity  of  existing  saiaction  procaduras  baan  analyzed. 
Empirical  rasaarch  is  claarly  callad  for.  Until  this 
rasaarch  is  accompl i shad,  no  basis  axists  for  criticizing 
or  andorsing  AFIT  aval  oat  ion  procaduras.  All  wa  can  say 
conclusivaly  is  that  wa  do  not  know  whathar  or  not  tha 
aval uat ion  procass  is  accurata.  An  ampirical  study  may 
provida  support  for  tha  AFIT  Ragistar's  saiaction  procass, 
including  its  usa  of  GRE  and  GMAT  scoras,  or  it  may  suggast 
that  othar  mathods  could  ba  mora  usaful .  In  aithar  casa, 
it  should  furnish  a  basis  for  avaluating  past,  prasant,  and 
futura  admissions  practicas. 

Tha  primary  purposa  of  this  study  is  to  avaluata 
tha  cr i tar i on-ral atad  validity  of  tha  GRE  and  GMAT  and 
othar  variablas  as  pradictors  of  succass  in  AFIT  rasidant 
mastar's  dagraa  programs.  To  astablish  a  basis  for 
comparison,  tha  validity  of  tha  prasant  saiaction  procass 
was  i nvast i gatad.  Finally,  pradiction  modals  wara 
davalopad  and  thair  af f ac t i vanass  comparad  with  tha 
historical  accuracy  of  AFIT  admissions  dacisions. 


BasKflrQund 

Tha  cr i tar i on-ral atad  validity  of  tha  Graduata 
Racord  Examinations  <GRE>  and  Graduata  Managamant 
Admissions  Tast  (GMAT)  in  pradicting  studant  succass  in 


8 


graduate  schools  has  been  the  subject  of  many  studies.  The 
GRE  has  become  -firmly  established  as  a  device  -for 
evaluating  the  relative  academic  potential  o-f  prospective 
graduate  students  throughout  a  wide  range  o-f  academic 
disciplines.  The  GMAT,  which  is  designed  -for  use  by 
business  and  management  school s,  has  also  become  an 
important  tool  in  graduate  student  selection  (Hecht  and 
Powers,  1982).  Both  tests  have  known  reliability,  and  are 
general  enough  to  measure  the  knowledge,  aptitudes,  and 
skills  o-f  a  wide  variety  o-f  individuals  -from  different 
educational  backgrounds  (Educational  Testing  Service, 
1981). 

The  GRE  and  GMAT  are  standardized  tests  with  norm 
and  scale  scores.  Standardization  refers  to  the 
admi n i strat i on ,  apparatus,  and  scoring  methods  associated 
with  the  use  of  the  measurement  device.  Educational 
Testing  Service  (ETS)  insures  through  carefully  controlled 
formal  administration  procedures,  that  each  time  a  test  is 
given  the  same  specific  steps  are  followed  by  the  test 
proctor.  Each  version  of  a  te*t  is  identical  in  appearance 
to  other  versions  of  the.  same  test.  Each  has  the  same 
number  of  questions,  the  same  type  of  answer  sheets,  and 
each  follows  the  same  format.  Each  version  is  analyzed  to 
insure  its  content  parallels  that  of  other  versions 
(Educational  Testing  Service,  1981). 


9 


The  term  “scaled  score"  as  it  i*  used  by  ETS , 
refers  to  the  practice  of  using  a  reference  group  to 
establish  a  scale  against  which  the  performance  of 
subsequent  examinees  can  be  measured.  According  to  ETS, 
the  reference  group  for  the  GRE  consisted  of  a  large  group 
of  college  seniors  from  eleven  undergraduate  institutions 
who  took  both  the  GRE  verbal  and  quantitative  subtests  in 
1932.  The  mean  score  for  this  entire  group  was  set  to 
equal  500,  and  a  standard  deviation  was  set  it  100. 
Through  statistical  manipulation  of  test  score  data,  ETS 
sets  the  means  and  standard  deviations  of  subsequent  groups 
of  examinees  to  the  same  parameters.  ETS  asserts  that 
comparisons  between  the  scores  of  two  (or  more)  examinees 
is  useful  and  valid  when  consideration  is  given  to  errors 
of  measurement.  That  is,  ETS  is  careful  to  point  out  that 
small  differences  in  test  scores  are  relatively 
meaningless  (Educational  Testing  Service,  1991). 

Tests,  like  other  tools,  are  designed  with  specific 
purposes  in  mind.  As  an  aptitude  test,  the  GRE  is  designed 
to  measure  the  effects  of  learning  that  occurred  over  a 
relatively  long  period  of  time  under  relatively 
uncontrolled  conditions.  Its  purpose  is  to  predict 
performance.  This  can  be  contrasted  with  the  use  of 
achievement  tests  to  measure  the  learning  and  skills  that  a 
person  has  acquired  in  a  more  structured  formal  setting 


10 


(Anastas i ,  1974,  Educational  Testing  Service,  1981).  The 
GMAT,  which  measures  knowledge  in  a  specific  area  to  a 
greater  extent  than  the  GRE  does,  is  more  of  an  achievement 
test  than  an  aptitude  test.  It  is  important  to  realize 
that  there  is  no  absolute  distinction  between  the  two  types 
of  tests,  since  similar  items  appear  in  both.  The 
distinction  is  based  on  the  use  of  the  scores  from  the 
tests  rather  than  any  inherent  differences  in  the  tests 
themselves. 

Both  the  GRE  and  the  GMAT  are  divided  into 
subtests.  The  GRE  has  three  subtests;  verbal , 
quantitative,  and  analytical.  The  analytical  subtest  was 
added  to  the  GRE  in  1977,  and  scores  for  it  have  been 
reported  since  1978.  ETS  cautions  that  this  test  should 
not  be  used  for  decision  making  until  its  validity  can  be 
demonstrated.  The  analytical  test  is  designed  to  measure 
an  individual's  ability  to  reason  in  a  logical  way,  to 
reach  sensible  conclusions,  and  to  identify  the  important 
factors  in  a  situation.  The  purposes  of  the  verbal  and 
quantitative  subtests  are  to  measure  aptitudes  in  those 
areas.  The  GMAT  contains  subtests  in  only  the  verbal  and 
quantitative  areas. 

On  the  surface,  quantitative  measures  are  easier  to 
interpret  than  subjective  measures.  They  fit  more  easily 
into  decision  criteria  formulae.  Certainly,  it  is  easy  to 


11 


pick  the  higher  of  two  scores.  It  is  much  more  difficult 
to  make  a  decision  based  on  individual  traits  such  as 
motivation,  persistence,  and  maturity,  though  most  people 
would  agree  that  these  'factors  contribute  to  an 
individual's  performance.  In  fact,  subjective  appraisals 
have  been  shown  to  be  less  effective  than  decisions  based 
on  statistical  measures  in  many  circumstances  because  of 
variability  among  raters  and  differences  in  criterion 
definitions  (Sawyer,  1 966).  In  addition,  quantitative 
measures  tend  to  lend  credibility  to  a  selection  process 
(Furst,  1950,  Marston,  1971).  In  terms  of  practical 
results,  the  use  of  test  scores  has,  in  general,  improved 
the  efficiency  of  many  organ i zat i ons  both  inside  and 
outside  the  educational  arena  (Travers,  1954). 

A  substantial  body  of  research  deals  with  the 
effectiveness  (or  ineffectiveness)  of  the  GRE  as  a 
predictor  of  success  in  graduate  education.  Similar 
research  of  the  GMAT  is  limited  by  comparison.  Criticisms 
of  the  predictive  validity  of  the  GRE  and  the  GHAT  have 
centered  around  the  low  correlations  that  have  been  found 
in  studies  of  the  relationships  between  these  predictors 
and  the  criterion  of  academic  success  as  measured  by 
graduate  grade-point  average  (GGPA).  The  key  factors 
contributing  to  the  low  correlations  are  explored  below. 


12 


ReT  i  abi  1  i  tv 


The  concept  of  test  reliability  deals  with  the 
accuracy  o-f  the  measurement. 

In  its  broadest  sense,  test  reliability 
indicates  the  extent  to  which  individual 
differences  in  test  scores  are  attributable  to 
•true*  differences  in  the  character i st i c»  under 
consideration  and  the  extent  to  which  they  are 
attributable  to  chance  errors.  (Anastasi ,  1976) 

The  Educational  Testing  Service  is  responsible  for 

developing  and  managing  the  GRE  and  GMAT  programs.  ETS  has 

consistently  demonstrated  that  the  reliability  of  both 

tests  is  above  .90  (Hecht  and  Powers,  1982,  Educational 

Testing  Service,  1981).  As  pointed  out  by  Cureton, 

reliability  is  a  necessary  prerequisite  for  meaningful 

validity  (Cureton,  19S0).  For  the  purpose  of  predicting 

student  performance  in  graduate  school,  a  high  measure  of 

reliability  increases  our  confidence  that  a  given 

prediction  i s  mean i ngf ul . 

mu  Vr 

Validity  is  concerned  with  what  tests  measure.  In 
general,  it  can  be  described  as  the  usefulness  of  the 
measurement.  Cr i ter i on-rel ated  validity,  the  central 
concept  in  prediction,  is  a  combination  of  two  of  these; 
concurrent  validity  and  predictive  validity. 
Cr i ter i on-rel ated  validity  emphasizes  the  relationship 
(i.e.,  correlation)  between  a  test  score  (predictor)  and 


13 


son*  other  measure  of  behavior,  the  criterion  of  success 
<Uomer,  1968).  The  criterion  of  success  is  some  future 
performance  that  is  of  interest.  When  considering 
prediction  of  academic  performance,  the  criteria  of 
graduate  grade-point  average  and  graduat i on/non-graduat i on 
have  frequently  been  employed.  These  .criteria,  and  others, 
are  used  to  improve  the  accuracy  with  which  schools  select 
students  that  are  likely  to  succeed  (Uomer,  1968). 

Brogden  demonstrated  that  the  correlation 
coefficient  represents  *he  proportional  improvement  in 
selection  that  results  from  the  use  of  a  predictor  over 
what  would  be  expected  in  a  selection  based  on  the 
criterion  alone.  He  interpreted  this  "as  showing  that  the 
correlation  coefficient  is  a  direct  index  of  predictive 
efficiency*  <1946).  Brogden  argues  convincingly  that 
decision  makers  should  consider  the  improvement  in 
selection  obtained  through  the  use  of  predictors  in  light 
of  the  costs  associated  with  obtaining  and  interpreting 
them.  Brogden's  point  is  that  the  users  of  a  test  are 
responsible  for  validating  its  u  i 1 i ty  in  both  economic  and 
predictive  terms.  Womer  makes  the  same  argument.  In 
discussing  the  use  of  standardized  tests  in  selection,  he 
statesi  "The  development  of  local  validity  studies  is  the 
best  possible  approach  to  cr i t&r i on-rel ated  validity.” 
<1968,  p . 61 ) 


14 


i 


If  any  positive  correlation  between  the  predictor 
and  the  criterion  is  achieved,  predictions  based  on  the 
predictor  will  be  more  accurate  than  random  chance. 
Although  decision  makers  would  prefer  perfect  prediction, 
validity  coefficients  are  usually  less  than  .60  in 
practice  <1968,  p.61>.  Traxler  noted  that: 

In  view  of  the  restricted  range  of  talent 
usually  represented  in  correlations  between  test 
scores  and  marks  at  the  graduate  level, 
correlations  in  the  neighborhood  of  .50  may  be 
regarded  as  satisfactory.  <1952,  p.476) 

Validity  coefficients  are  largest  in  groups  that  encompass 

a  wide  range  of  ability  levels.  In  groups  where  the  range 

of  ability  is  narrrow,  validity  coefficients  tend  to  be 

low.  As  the  range  of  abilities  in  a  group  becomes  narrower 

and  narrower,  it  became  progressi vel y  more  difficult  to 

differentiate  between  members  of  the  group  <Chronbach, 

1970). 

This  phenomenon  is  known  as  restriction  in  range. 
The  selection  process  itself  contributes  to  the  problem. 
As  products  of  successively  more  and  more  stringent 

screeenings,  graduate  students  form  a  group  that  is  very 
homogenous  compared  to  the  population  as  a  whole.  The 
difficulty  in  achieving  success  at  higher  and  higher  levels 
increases,  compensating  somewhat  for  the  effects  of 
restriction  in  range.  Even  so,  the  usual  pattern  is  for 
correlation  coefficients  to  decrease  as  groups  become 


15 


smaller  and  mort  homogenous.  In  discussing  the  effects  of 
restriction  in  range  on  prediction  studies,  Furst  and 
Roelfs  stated! 

Much  of  the  so-called  inconclusiveness  has  come 
from  low  correlations  and  these,  in  turn,  from 
persisting  conditions,  especially  restrictions  in 
range  owing  to  selective  admissions  and  attrition. 
<1979,  p . 147) 

Another  factor  that  tends  to  reduce  validity 
coefficients  is  compensatory  admissions  practices.  When 
students  are  admitted  to  graduate  school  despite  low  test 
scores,  it  is  usually  because  the  school  is  aware  of  other 
factors  that  compensate  for  the  test  scores.  For  instance, 
students  with  low  test  scores  may  be  admitted  on  the  basis 
of  strong  'JGPA's.  If  this  happens  often  enough, 
correlations  between  GRE  scores  and  GGPA  for  the  body  of 
students  are  likely  to  be  lower  than  they  would  have  been 
had  selection  been  based  on  GRE  scores  alone.  To  the 
extent  that  compensatory  admissions  practices  are  used, 
validity  coefficents  will  be  reduced  (Livingston  and 
Turner,  1982).  Robertson  and  Nielsen  (1961)  noticed  the 
same  effect  in  their  study. 

All  of  this  is  not  meant  to  suggest  that  actoii  ssions 
officers  should  use  GRE  or  GMAT  scores  exclusively  as 
selection  criteria.  Other  factors  may  indeed  be  very 
useful,  they  are  commonly  used  in  combination  with  test 
scores.  According  to  Chronbach  (1970),  the  addition  of 


other  relevant  Victors  to  a  prediction  model  will  generally 
improve  validity  coefficients. 


Analytical  methods 

Prediction  is  either  statistical  or  clinical. 
Statistical  prediction  uses  data  on  past  performance  of 
groups  to  predict  future  performance.  Clinical  prediction 
is  judgemental,  and  may  be  based  on  theoretical 
considerations  <Sawyer,  1966).  According  to  Thorndike,  the 
clinical  method's  only  advantage  is  that  it: 

permits  combination  of  scores  in  other  than  a 
linear  manner.  It  permits  a  maximum  of  flexibilty 
in  that  any  pattern,  no  matter  how  complex  or 
unique,  may  be  recognized  and  weighted.  For  this 
extreme  flexibility  to  be  an  advantage,  it  is 
necessary  <1>  that  special  patterns  and 

combinations  of  tests,  not  well  represented  by  a 

linear  combination  of  scores,  be  important  for 
success  on  the  job  and  <2>  that  there  be  clinicians 
available  who  have  the  insight  to  discover  those 
special  patterns  and  the  skill  to  recognize  then 
whenever  they  appear.  We  may  well  be  skeptical  on 
both  counts,  but  especially  on  the  second.  It 
represents  a  severe  demand  on  a  clinician's  insight 
to  expect  him  to  discover  better  ways  of  using  test 
scores  than  will  be  given  by  the  best  linear 

combination  of  those  scores,  and-  then  to  be 

consistent  in  identifying  and  interpreting  those 
patterns  when  they  reappear.  <1949,  p.201> 

Statistical  prediction  techniques  involving 
multiple  regression  and/or  the  use  of  correlation 
coefficients  have  been  used  in  all  but  one  study  reviewed 
in  this  report.  By  collecting  data  on  several  predictor 
variables  and  using  stepwise  regression  or  factor  analysis, 


17 


researchers  obtain  in-formation  about  the  relative 
contribution  o-f  various  predictors  to  the  model.  Often, 
those  measures  identified  as  the  strongest  contributors 
were  entered  into  the  model  with  no  weights  applied  to 
them.  This  practice  is  known  as  unit  weighting. 

Jensen  pointed  out  that: 

Given  a  set  of  predictive  measures  from  which 
it  is  desired  to  predict  graduate  scholastic 
achievement  of  different  groups,  equal  powers  of 
prediction  should  not  be  arbitrarily  given  to  each 
or  any  combination  of  these  var i abt es .  Empirical 
tests  should  first  be  made  to  ascertain  differences 
in  group  performance  based  on  the  predictive  and 
criterion  variables  and  data  weight  derived  for 
each  member  of  the  predictive  team  (1953,  p.323). 

By  the  term  "predictive  team",  Jensen  is  referring  to  the 

group  of  predictors  the  researcher  considers  to  be 

logically  related  to  performance  in  the  criterion.  His 

argument  against  arbi trary  weighting  of  predictors  is 

sensible,  especially  when  techniques  such  as  multiple 

linear  regression,  which  assigns  weights  statistically,  are 

avai table. 

Two  studies,  Madaus  and  Walsh  <1965),  and  Covert 
and  Chansky  <1977),  set  out  to  determine  optimal  prediction 
models  by  dividing  large  groups  into  smaller,  more  specific 
ones.  Through  this  aporoach  they  sought  to  demonstrate 
that  the  performance  of  different  groups  of  people  can  be 
best  predicted  using  different  prediction  models.  Their 
hypothesis  was  that  alternate  weighting  strategies,  based 


r* r*  ". 


on  the  characteristics  of  subgroups  would  be  more  effective 
than  simple  unit  weighting.  Their  efforts  were  successful. 
Even  though  the  need  for  research  in  optimal  predictor 
weighting  strategies  had  been  called  for  by  Jensen  <1953), 
efforts  to  do  so  have  been  limited  (Covert  and  Chansky, 
1977). 

Data  collection  is  statistical  (or  mechanical)  if 
rules  can  be  prescribed  to  insure  that  clinical  judgement 
is  not  involved.  Se 1 f-repor ted  and  clerically  obtained 
data  including  psychometric  tests,  biographical  data,  and 
grade  reports  are  examples  of  statistical  data.  Interviews 
and  judge's  ratings,  unless  strictly  limited  to  recording 

pre-spec i f i ed  character i st i cs  are  clinical  in  nature 

! 

(Sawyer,  1964). 

Once  data  of  either  type  are  collected,  they  are 
combined  in  prediction  models  through  step-wise  regression 
or  other  statistical  techniques.  These  methods  identify 
the  predictor  with  the  highest  correlation  with  the 
criterion  and  build  the  prediction  model  around  it.  In  a 
step-by-step  process  the  relative  contribution  of  each 
predictor  to  the  model  is  evaluated,  and  the  predictors  are 
added  to  the  model  in  order  of  their  contribution.  Once 
the  model  cannot  be  improved  through  the  addition  of 
another  predictor,  the  process  is  complete.  Sawyer  noted 
that  when,  data  are  collected  by  both  the  statistical  and 


19 


the  clinical  methods,  the  advantages  of  statistical 
combination  is  the  greatest.  In  addition,  he  found  that: 
“The  present  analysis  finds  the  njechanical  mode  of 
combination  always  equal  or  superior  to  the  clinical 
mode..."  (1966). 

Pri.d'.ctori 

The  1 i terature  on  validity  studies  makes  it  clear 
that  there  are  nearly  as  many  approaches  to  prediction  as 
there  are  researchers.  Although  this  section  focuses  on 

the  role  of  predictors,  some  discussion  of  criterion 
measures  and  research  methodology  is  inevitably  included. 
Depending  on  the  goals  of  a  particular  study,  a  wide  range 
of  predictors  have  been  employed. 

Thacker  and  Williams  (1974)  reviewed  twelve  studies 
of  GRE  predictive  validity  which  spanned  the  period  from 
1937  to  1970.  In  ten  of  the  twelve  studies  GGPA  was  the 
primary  criterion  variable.  One  study  (Robertson  and 
Nielsen,  1961)  used  faculty  ratings,  and  another  (Law, 
I960)  used  pass/fail  doctoral  comprehensi ves  as  the 
criterion.  Six  of  the  ten  studies  using  GGPA  as  the 
criterion  found  correlations  that  were  either  not 
statistically  significant,  or  were  too  low  to  be  used 
effectively  in  prediction. 

Based  on  these  results,  Thacker  and  Williams' 


20 


conclusion  that  the  criterion  of  GGPA  is  of  doubtful 
predictive  value  is  not  surprising.  Other  researchers, 
including  Marston  <1971)  and  Nagi  <1975)  have  come  to 
similar  conclusions.  Thacker  and  Williams  reported  that 
the  limited  range  and  inherent  variability  of  the  GGPA 
criterion  were  partly  responsible  for  this  finding.  They 
also  noted  that!  "the  use  of  other  measurement  criteria  has 
not  consistently  yielded  improved  corre 1  at i ons"  (Thacker 
and  Williams,  1974).  Given  the  relatively  small  sample 
sizes  <N  was  less  than  30  in  three  of  the  five  studies), 
and  the  likelihood  that  other  factors  also  influenced  the 
size  of  the  correlations,  this  conclusion  is  not 
surprising. 

Using  faculty  ratings  as  the  criterion  of  success, 
Robertson  and  Nielsen  <1961)  arrived  at  the  same 
conclusions.  In  their  study,  nine  faculty  members  each 
rated  fifty  students  according  to  their  perceptions  of  the 
students'*  ability  to  complete  a  psychology  doctoral 
program.  The  ratings  were  then  combined  to  form  a 
composite  score  for  each  ratee.  The  mean  GRE  score 
correlated  with  this  criterion  .27  at  the  .03  level  of 
significance.  The  authors  concluded  that  the  results  were 
too  weak  to  be  used  in  prediction.  However,  the 
combination  of  mean  GRE  scores  and  UGPA  in  math/science 
courses  correlated  .44  with  the  criterion  at  the  same 


21 


significance  level,  indicating  that  the  combination  of  the 
two  was  a  better  predictor  than  was  the  GRE  alone.  While 
this  supports  Chronbach's  observation  that  increasing  the 
number  of  predictors  will  generally  increase  correl at i ons, 
it  is  important  to  note  that  Chronbach  was  referring  to  a 
general  outcome  of  adding  more  variables  (information)  to  a 
regression  model,  not  a  specific  situation  (Robertson  and 
Nielsen,  1961 ,  Chronbach,  1977). 

Nagi  used  the  GRE  as  a  predictor  of  a  dichotomous 
criterion*  completion/non-completion  of  a  doctoral 
education  program.  He  was  unable  to  find  any  significant 
correlations,  despite  his  use  of  a  sample  which  included 
thirty  non-graduates  as  well  as  thirty-three  graduates 
(1975).  You  might  expect  that  the  inclusion  of  the 
non-graduates  would  increase  the  the  heterogene i ty  of  the 
sample  and  therefore  the  size  of  the  correlations  achieved. 
However,  this  strategy  was  ineffective.  Since  the 

non-graduates  were  sel ected  for  the  program,  their 
scores  on  the  predictors  were  similar  to  the  scores  of 
those  who  completed  the  doctoral  program.  The  study  may 
also  be  criticized  for  its  small  sample  size,  whi'ch  may 
have  prevented  it  from  achieving  more  conclusive  results. 

Camp  and  Clawson  examined  the  predictive  validity 
of  the  GRE  with  respect  to  the  criterion  of  GGPA.  They 
obtained  a  correlation  of  .24  for  the  total  GRE  score  (the 


22 


sum  o f  the  GRE  verbal  and  quantitative  suotest  scores),  and 
a  correlation  of  .27  at  the  .01  significance  level  for  the 
verbal  subtest  alone.  They  concluded  that  the  results  were 
not  strong  enough  to  be  useful  in  predicting  success  for  a 
group  such  as  the  135  Master  of  Arts  in  Counseling 
candidates  they  studied  (Camp  and  Clawson,  1979).  However, 
in  view  of  Brogden's  (194<6>  finding  that  even  small 
improvements  in  selection  can  result  in  significant 
benefits  to  the  organization,  it  appears  that  Camp  and 
Clawson's  conclusion  was  premature.  Many  st-udi-es  can  be 
criticised  on  the  same  grounds. 

Selection  based  on  cut-off  scores 

To  determine  the  effectiveness  of  the  GRE  in 
discriminating  between  successful  and  unsuccessful 
students,  Borg  (1963)  used  a  dichotomous  criterion. 

Students  were  Judged  successful  if  their  GGPA  was  equal  to 
or  greater  than  3.0,  and  unsuccessful  if  their  GGPA  was 
less  than  3.0.  To  test  the  hypothesis  that  successful 
students  can  be  differentiated  from  unsuccessful  students, 
Borg  created  intervals  equal  to  one-half  of  one  standard 
deviation  and  computed  the  number  of  scores  that  fell  in 
each  of  the  six  intervals  he  created.  He  determined  that  a 
GRE  verbal  test  cut-off  score,  established  at  one-half 
standard  deviation  below  the  mean  score  for  his  sample  of 


23 


179  would  have  eliminated  72X  of  the  unsuccessful  students. 
The  same  cut-off  score  would  have  had  the  undesired  effect 
of  eliminating  2 T/.  of  those  students  that  were  successful. 
As  a  result,  admission  would  have  been  denied  to  41 
successful  students  and  21  unsuccessful  students.  Based  on 
these  findings,  Borg  concluded  that  the  use  of  a  GRE  verbal 
test  cut-off  score  should  not  be  used  at  Utah  State 
Un i versi ty  <  1 963) . 

After  analyzing  the  results  of  his  own  predictive 
validity  study  and  Educational  Testing  Service  reports  of 
other  validity  studies,  Marston  <1971 >  warned  against  the 
use  of  fixed  cut-off  scores.  Marston  attempted  to  predict 
the  publication  rates  for  sixty-four  clinical  and 
forty-seven  non-clinical  psychologists  based  on  the  GRE 
scores  they  had  earned  prior  to  acceptance  in  graduate 
school.  There  was  a  difference  in  the  correlations  he 
found  for  the  two  groups  <clinical  r  «  .01,  non-clinical  r 
•  .27,  p>.09).  However,  the  practical  value  of  this 

information  is  unclear.  Marston's  study  can  be  criticized 
for  its  small  sample  size  as  well  as  its  unreliable  and 
possibly  unrealistic  criterion. 

When  several  predictors  are  relevant,  a  common 
practice  is  establish  cut-off  scores  on  each  of  them.  The 
result  of  using  multiple  cut-off  scores  is  to  eliminate 
individuals  from  consideration  if  their  score  on  any  of  the 


24 


\ 


criteria  is  low.  On  the  other  hand,  a  mul t i war i ate  linear 
regression  -model  allows  high  scores  on  one  predictor  to 
compensate  for  low  scores  on  another.  Compensation  can  be 
desirable  in  situations  where  a  strength  in  one  area  can 
make  up  for  a  weakness  in  another.  Multiple  cut-off  scores 
are  more  appropriate  in  situations  where  a  specific  trait, 
or  .  prerequisite  cannot  be  compensated  for  by  other 
abilities  <Chronbach,  1970,  pp. 437-433). 

One  problem  with  the  use  of  multiple  cut-off  scores 
is  that  there  is  no  analytical  method  for  establishing  the 
minimum  acceptable  scores.  Determining  the  effect  of  a 
single  cut-off  score  is  relatively  easy,  but  with  multiple 
cut-off  scores  the  process  is  necessarily  one  of  trial  and 
error.  The  combined  effect  of  multiple  cut-off  scores 
creates  a  non-linear  selection  model. 

The  one  case  in  which  we  would  expect  those  who 
were  selected  by  the. multiple  cutoff  procedure  to 
surpass  in  criterion  performance  those  selected  by 
multiple  regression  is  that  in  which  the 
relationship  of  one  or  more  of  the  tests  to  the 
criterion  is  sharply  non-linear.  If  there  is  some 
unique  critical  score  on  a  particular  test  below 
which  all  or  most  applicants  do  poorly  on  the  job 
and  above  which  a  smaller  proportion  do  poorly  on 
the  job  no  matter  what  their  other  qualifications, 
then  a  procedure  which  determines  that  point  and 
establishes  a  fixed  cutoff  at  that  point 
undoubtedly  has  advantages.  However,  in  so  far  as 
a  continuous  and  approx imate 1 y  linear  relationship 
exists  between  score  on  each  of  the  tests  and  the 
criterion  score  of  success  on  the  job,  no  basis 
exists  for  choosing  a  uniquely  desirable  cutting 
score.  (Thorndike,  1949,  p.  198) 


23 


In  addition,  Thorndika  points  out  that  muttipla  cutoff 
scorts  provide  no  in-formation  about  tha  dagraa  of 
suitability  o-f  an  applicant.  In  an  anvironmant  whera  tha 
intant  is  to  salact  tha  bast  quali-fiad,  this  mathod  is  not 
par t i cul ar i 1 y  useful  <1949,  p.  199). 

i  i 

Qthar  approaches 

In  a  study  o-f  tha  GMAT,  Breaugh  and  Mann  <1931) 
usad  discriminant  analysis  to  datarmina  whether  or  not 
graduatas  of  an  MBA  program  could  ba  statistically 
di f f arant i atad  from  non-graduates .  Tha  sampla  consistad  of 
907  graduata  studants.  Of  this  group,  266  graduatad,  193 
voluntarily  withdraw,  and  48  wara  tarminatad  for  acadamic 
dafficiancy.  Tha  authors  groupad  all  non-graduates 
togathar . 

Braaugh  and  Mann  wara  abla  to  di ff arant i ata  tha  two 
groups.  Studant  aga  and  GMAT  Quantitative  subtast  scoras 
wara  tha  most  heavily  waightad  variables.  Thair  mathod  was 
69V.  accurata  in  pradicting  graduation.  This  was  contrastad 
with  tha  32 V.  accuracy  of  tha  admissions  committaa.  Tha 
critaria  usad  by  tha  admissions  committaa  wara  not 
mantionad,  but  statistics  on  tha  studants  wara  raportad. 
For  this  sampla,  tha  maan  UGPA  was  2.98,  tha  maan  GMAT 
Verbal  scora  was  31.8  <71st  parcantila)  and  tha  maan  GMAT 
Quantitative  tast  scora  was  31.7  (also  71st  parcantila), 


26 


indicating  that  the  admissions  committee  had  set  ‘fairly 
high  standards*  for  applicants  (Breaugh  and  Mann,  1981). 

Although  the  main  focus  here  is  on  the  GRE  and  GMAT 
as  predictors  of  success,  these  variables  are  seldom  used 
alone.  The  relationships  between  a  number  of  other 
variables  and  GGPA  have  been  investigated.  Of  these 

variables  a  review  of  the  1 i terature  shows  that 

undergraduate  grade-point  average  is  the  most  common.  In 
an  analysis  of  189  validity  studies  conducted  by  ETS 
between  1973  and  1981,  Livingston  and  Turner  found  that  ; 

The  combination  of  GRE  scores  and  undergraduate 
grades  predicts  first  year  grades  much  more 
effectively  than  either  the  GRE  scores  alone  or 
undergraduate  grade-point  average  alone.  (1982) 

Several  studies  have  tested  background  variables  as 
predictors.  Baird  <1973)  used  quest i ona i res  to  obtain 

background,  self-assessment,  and  GGPA  information  on  over 
2,000  graduate  students.  He  determined  that  a  student's 
confidence  in  his  abilities  and  his  family  background  were 
related  to  success  in  business  and  law  schools. 

Mehrabian  <1969)  investigated  the  effectiveness  of 
a  number  of  variables  in  predicting  success  for  266 
applicants  for  admission  to  a  graduate  psychology  program, 
and  79  students  already  enrolled  in  that  program. 
Prediction  criteria  of  sex,  increase  in  GPA  for  the  last 
two  undergraduate  years  over  the  first  two  years,  the 
rating  of  the  student's  undergraduate  department,  and 


27 


research  experience  were  excluded  -from  the  final  model. 
Factor  analysis  showed  that  those  variables  did  not  relate 
significantly  to  the  success  criteria.  The  success 
criteria  employed  were  an  average  evaluation  of  research 
competence,  average  grades  in  first  year  statistics 

courses,  and  average  grades  in  first  year  content  courses. 

Despite  tb*  fact  that  he  found  little  evidence  in 
the  1 i terature  to  support  the  use  of  letters  of 
recommendation  as  predictors,  in  his  own  study,  Mehrabian 
reported  that  they  were  the  second  strongest  predictor  of 
graduate  school  succcess.  The  best  single  predictor 
Mehrabian  found  was  the  sum  of  the  student's  GRE  and 
Miller's  Analogy  Test  (MAT)  scores.  Although  he  determined 
that  the  UGPA  over  the  last  two  years  of  undergraduate 
school  had  a  stronger  relationship  to  graduate  performance 
than  overall  UGPA,  this  predictor  was  not  strong  enough  to 
be  included  in  his  final  model  (Mehrabian,  1969). 

Mehrabian's  criteria  are  questionable,  and  his 
findings  have  not  been  replicated.  In  fact,  Lin  and 
Humphreys'  study  disagrees  with  Mehrabian's  on  one  point. 
After  analyzing  patterns  in  the  undergraduate  and  gr?duate 
school  records  of  over  2,000  students,  Lin  and  Humphreys 
stated!  “There  is  no  evidence  that  senior  grades  predict 
grades  in  graduate  departments  more  accurately  than 
freshman  grades."  <1977,  p.236) 


28 


Lewis  examined  the  relationship  between  six 
predictor  variables  and  two  criterion  measures  in  the  MBA 
program  at  the  University  of  Iowa.  The  predictors 
included:  the  number  o-f  undergraduate  semester  hours  in 
business  related  courses,  GPA  in  those  courses,  cumulative 
UGPA,  undergraduate  major,  and  Graduate  Study  in  Business 
Test  (GSBT)  scores.  (The  GSBT  was  a  -forerunner  o-f  the 
GMAT.)  Using  -stepwise  regression,  Lewis  -found  that  GPA's 
in  business  courses  and  scores  on  the  quantitative  portion 
o-f  the  GSBT  were  the  best  predictors  o-f  GGPA  in  required 
MBA  courses.  He  was  unable  to  -find  any  predictors  that 
correlated  significantly  with  his  second  criterion, 
persistence  in  the  MBA  program  (Lewis,  1964). 

In  a  1963  study,  Mittman  and  Lewis  investigated  the 
relationships  between  the  other  five  predictors  used  in 
Lewis"  1964  study  and  the  criteria  of  GSBT  Verbal  and 
Quantitative  scores.  Stepwise  regression  revealed  that  the 
only  background  variable  to  correlate  with  the  verbal  test 
score  was  the  number  of  undergraduate  semester  hours  taken 
in  business  courses.  This  correlation  coefficient  was  .65 
at  the  .05  significance  level.  The  relatively  strong 
relationship  between  verbal  scores  on  the  GSBT  and  the 
number  of  undergraduate  business  courses,  demonstrates  that 
the  GSBT  is  a  good  achievement  test.  Undergr aduate 
department  and  undergraduate  major  were  also  found  to  be 


29 


sign  if ieantly  corralatad  with 

(Mittman  and  Lawis,  1965). 


tha  quantitativa  ttst 


cr  i  tar  i  a. 

If  a  cr i tar  ion  maasura  is  not  stabla,  not 
consistant,  it  will  ba  impossibla  for  any  tast  op 
othar  pradictor  to  ralata  wall  with  it.  (Womar , 
1968) 

Anothar  problam  with  tha  critarion  is  tha 
possibility  that  it  is  biasad  in  favor  of  cartain 
groups  of  paopla.  If  tha  critarion  is  affactad  by 
factors  unralatad  to  tha  attributa  it  is  dasignad 
to  maasura,  it  may  ba  biasad.  (Womar,  1968) 

Travars  and  Wallaca  advisad  that  graduata  schools 
monitor  tha  stability  of  avaraga  gradas  from  yaar  to  yaar 
and  from  dapartmant  to  dapartmant.  Thay  round  that  in  ona 
anginaaring  school,  usaful  pradiction  cf  GGPA  was 
impossibla  bacausa  of  its  larga  variability  (1950). 


30 


Departmental  di  fferences 

Madaus  And  Walsh  (1963)  investigated  the 
differences  between  departments.  They  found  that 
department  sizes  were  strongly  related  to  the  sizes  of  the 
correlation  coefficients  achieved  for  those  departments. 
'They  observed  higher  correlations  in  larger  departments 
than  they  saw  in  smaller  ones,  or  in  the  university  as  a 
whole.  When  GRE  Verbal  and  Quantitative  scores  were  used 
to  predict  GGPA,  correlations  ranged  from  .22  for  the 
entire  sample  of  569  students  to  .69  for  a  single 

department  <N  *  68).  Based  on  their  findings  the 

researchers  wrote: 

It  would  appear,  therefore,  that  the  size  of  N 
is  a  definite  factor  relative  to  whether  or  not  a 
significant  relationship  is  found  between  the 
dependent  and  independent  variables.  The  findings 
of  this  study  lead  one  to  the  conclusion  that  the 
practice  of  grouping  departments  for  predictive 
purposes  should  not  be  employed.  No  matter  how 
logical  the  grouping  appears  to  be,  the  results  are 
likely  to  be  of  limited  utility.  (Madaus  and  Walsh, 
1963) 

Grouping  departments  to  increase  sample  size,  based  on 
judgement  appears  to  be  counter-product i ve .  Jensen 
recognized  that  differences  between  departments  occured 
because  student  abilities  and  grading  practices  vary 
between  departments.  If  the  groups  stat i st i cal  1 y  differ 
with  respect  to  the  relevant  variables,  variability  tends 
to  increase  and  correlation  coefficients  tend  to  decrease 
(Jensen,  1953).-  On  the  other  hand,  if  the  differences 


31 


i 


I 

'  « 

I 


i 

* 


•-J 

%* 


V 

\> 


t 


4 

ft 


* 


4 


P 


between  group*  were  tested  u*ing  statistical  method*. 
and  groups  of  similar  programs  formed,  the  correlation 
coefficants  should  not  decrease  si  gn  i  f  i  cantt  y. 

In  their  review  o-f  18?  GRE  validity  studias, 
Livingston  and  Turnar  obsarvad  that  within  tha  group  o-f  41 
dapartmants  having  lass  than  25  studants,  variations  in  the 
corralation  coa-f-f  i  c  i  ants  wara  noticaably  larga.  This 
occurrad  batwaan  dapartmants  and  within  tha  sama  dapartment 
from  yaar  to  yaar.  Their  analysis  _f  thasa  variations 
causad  tha  authors  to  stata: 

Individual  dapartmants  diffar  widaly  in  tha 
correlations  of  GRE  scores  with  FYA ,  but  thasa 
differences  may  mainly  b*  the  result  of  small 
sample  instability.  <1982> 

FYA  in  tha  previous  quota  refers  to  first  yaar  graduate 
grade-point  average.  It  seams  likely  that  differences  in 
departmental  grading  criteria  may  have  bean  partly 
responsible  for  tha  i ntar-dapar tmental  differences,  and 
that  the  affect  of  small  sample  instability  is  batter  shown 
in  tha  yaar  to  year  i ntra-departmental  fluctuations. 

Lin  and  Humphreys  used  a  sample  selection  strategy 
to  reduce  tha  affects  of  differential  grading  standards. 
They  selected  three  particular  graduate  departments 
because : 


19 


32 


they  attract  somewhat  similar  students,  have 
large  numbers  of  graduate  students,  and  have 
faculties'  that  have  more  or  less  maintained 
reliable  and  valid  standards  of  graduate  and 
undergraduate  grading.  This  last  criteria  barred  a 
large  number  of  departments  from  consideration. 
<1977,  p.250) 

They  found  that  the  academic  performance  of  students  with 
high  test  scores  and  UGPA/s  was  more  stable  than  those  with 
lower  test  scores  and  grades,  and  the  performance  of  better 
students  was  more  predictable  <1977,  p.  252). 

The  selection  ratio 

Prediction  of  academic  performance  i s  an  important 
topic  for  research.  While  the  consequences  of  inaccurate 
decisions  or  policies  are  serious,  accurate  prediction  of 
success  in  graduate  school  is  elusive.  The  variability  of 
undergraduate  and  graduate  grades,  the  effects  of 
restriction  in  range  and  small  sample  sizes  have 
consistently  been  cited  as  factors  contributing  to 
prediction  problems.  Researchers  have  investigated  a 
number  of  predictors  and  criteria  with  mixed  results.  In 
general,  background  variables  have  had,  at  best,  moderate 
correlations  with  GGPA.  The  most  commonly  chosen 
predictors,  UGPA,  GRE  test  scores,  and  GMAT  test  scores 
have  usually  demonstrated  statistically  significant 
relationships  with  GGPA ,  but  have  seldom  yielded 
correlations  researchers  consider  neccesary.  In  this 


33 


respect  many  promising  research  #f forts  haw*  b**n  abandontd 
too  quickly.  As  Brogden  <1946)  demonstrated,  even  a  small 
improvement  in  selection  can  be  valuable  in  many 
si tuat i ons. 

A  critical  element  in  determining  the  value  of 
cr i ter i on-rel ated  validity  research  has  gone  unmentioned  by 
many  researchers.  This  element  is  the  selection  ratio. 
The  selection'  ratio  can  be  computed  by  dividing  the  number 
of  selected  applicants  by  the  total  number  of  applicants. 
Taylor  and  Russell  <1939)  demonstrated  convincingly  that 
the  usefulness  of  tests  with  a  validity  of  less  than  .70 
increases  more  and  more  as  the  selection  ratio  becomes 
smal 1 *r . 

They  developed  a  series  of  tables  that  depict  the 
relationships  between  the  selection  ratio,  the  proportion 
of  individuals  rated  satisfactory  (before  the  us*  of  a 
predictor  or  prediction  model),  and  the  validity  of  the 
predictor  or  prediction  model.  By  using  the  appropriate 
table,  a  researcher  can  estimate  the  benefits  that  can  be 
derived  from  the  use  of  a  predictor  or  prediction  model, 
based  on  a  validity  estimate  that  reflects  the  influence  of 
the  selection  ratio.  For  example,  if  50’/.  of  the  present 
students  .in  a  graduate  school  were  successful,  a  selection 
ratio  of  .5  was  used,  and  the  validity  of  a  new  test  was 
.6,  the  tables  show  that  94/.  of  those  selected  would  be 


34 


successful*  The  substantial  increase  in  the  proportion  of 
successful  students  from  30 V.t  which  would  be  expected  if 
all  applicants  were  admitted,  to  94X,  if  a  test  with  a 
validity  of  .6  was  used  and  the  selection  ratio  remained 
constant,  shows  the  powerful  effects  of  the  selection  ratio 
(Taylor  and  Russell,  1939,  pp. 570-578). 

Summary 

A  wide  range  of  approaches  has  been  used  in 
cr i ter i on-rel ated  validity  studies.  A  lack  of  agreement 
concerning  the  relevant  variables,  and  appropriate 
techniques  for  analyzing  their  intei — relationships  has 
resulted  in  a  large  number  of  exploratory  investigations 
and  few  in-depth  studies.  Even  when  research  has 
identified  promising  techniques,  or  potentially  important 
variables,  later  researchers  have  seldom  attempted  to 
incorporate  them  in  their  own  studies.  The  results  paint  a 
clear  picture  of  what  has  not  worked  in  a  variety  of 
specific  situations,  but  leave  only  a  vague  impression  of 
what  may  be  useful  in  general  application. 

It  is  clear  that  there  is  room  for  improvement  in 
the  prediction  of  graduate  school  success.  It  is  equally 
apparent  that  reliance  on  published  data  to  support  the  use 
of  a  particular  test  or  prediction  model  cannot  be 
justified.  There  are  important  differences  between  schools 

35 


/ 


and  batwaan  dapartmants  within  tham,  that  maka  focal 
validity  rasaarch  nacassary.  Naarly  avary  rasaarchar  has 
agraad  in  ona  raspact:  continuing  rasaarch  and  ampirical 
studias  o-f  cr  i  tar  i  on-ra  1  atad  validity  ara  naadad.  8asad  on 
tha  matarial  raportad  hara,  it  is  dw'1  that  thasa 
racommandat i on*,  at  laast,  ara  valid. 


36 


Research  hypotheses 

1.  The  correlations  of  the  predictor  variables  with  GGPA 
vary  between  AFIT  master's  degree  programs.  In  at  least 
some  cases  the  differences  between  program  correlation 
coefficients  are  statistically  significant. 

2.  The  correlations  computed  for  the  entire  sample  are 
lower  than  at  least  some  of  those  computed  for  individual 
programs. 

3.  When  groups  are  formed  based  on  statistically  similar 
predi ctor/cr i ter i on  relationships,  and  mul t i -var i ate 
regression  models  are  developed  for  those  groups,  the 
prediction  models  developed  for  the  groups  contain 
different  sets  of  predictors  and  different  predictor 
weights. 

4.  Graduate  Record  Examinations  test  scores,  Graduate 
Management  Admissions  Test  scores,  and  undergraduate 
grade-point  average  are  valid  predictors  of  graduate 
grade-poi-nt  average. 

3.  Background  variables  such  as  commissioned  years  of 
service  <CYRS>,  enlisted  years  of  service  <EYRS>,  and 
number  of  undergraduate  math  courses  <NMAT>  add  to  the 
accuracy  of  one  or  more  of  the  prediction  models. 

6.  The  models  developed  in  this  study  are  more  accurate 
than  AFIT's  current  selection  procedures. 


37 


CHAPTER  II 


METHODS 


Sub.jt.sV? 

The  subjects  in  this  study  include  all  resident 
AFIT  master's  degree  students  in  the  School  of  Systems  and 
Logistics  and  the  School  of  Engineering  who  attended  AFIT 
between  1977  and  1982,  inclusive.  The  information 
collected  includes  relevant  predictor,  criterion,  and 
biographical  data  for  approx imatel y  98X  of  the  total 
population  group.  The  total  data  base  includes  2170  cases. 
Demographic  information  is  contained  in  Appendix  A. 


For  convenience,  abbreviated  variable  names  will  be 
used  throughout  the  remainder  of  this  thesis.  The  variable 
names  are  defined  below. 


GMTT  GMAT  composite  score 

GMTV  GMAT  Verbal  subtest  score 

GMTQ  GMAT  Quantitative  subtest  score 

GRET  The  sum  of  the  GRE  verbal  and 

quantitative  subtests 
GREV  GRE  Verbal  subtest  score 

GREQ  GRE  Quantitative  subtest  score 

GREA  GRE  Analytical  subtest  score 

EYRS  Enlisted  years  of  service 

CYRS  Commissioned  years  of  service 

NMAT  Number  of  undergraduate  mathematics  courses 

TOEF  Test  of  English  as  a  Foreign  Language  score 
UGPA  Undergraduate  grade-point  average 

GGPA  Graduate  grade-point  average 


38 


In  the  -first  step  of  the  analysis,  correlation 
matrices  containing  all  o-f  the  variables  were  calculated 
•for  the  entire  sample  and  -for  each  o-f  the  17  AFIT  resident 
master's  degree  programs  using  the  Statistical  Package  -for 
the  Social  Sciences  (SPSS)  Pearson  Corr  program  (Nie,  et 
at,  1973).  The  matrices  were  calculated  using  pair-wise 
deletion  o-f  missing  values  so  that  each  correlation 
coe-f-f  i  c  i  ent  would  be  based  on  the  largest  possible  sample 
size.  This  was  necessary  because  the  number  o-f  cases  with 
missing  values  was  very  large.  For  example,  only  1330  o-f 
the  2170  cases  <61. 3X)  contained  GRE  data. 

The  data  base  contained  in-formation  on 
non-graduates,  1 ate-graduates ,  and  graduates,  however  it 
did  not  contain  in-formation  on  applicants  who  had  not  been 
selected  -for  AFIT  resident  master's  degree  programs. 
Because  the  mean  scores  o-f  the  selected  group  and  the 
non-selected  group  differ  for  those  variables  used  in 
making  selection  decisions,  it  is  necessary  to  consider  the 
effects  of  restriction  in  range.  Restriction  in  range 
attenuates  the  correlation  coefficients  between  the 
predictors  and  the  criterion.  In  case*  where  only  a  small 
proportion  of  af'li cants  are  selected,  the  attenuation  can 
be  significant  (Thorndike,  1949,  pp. 169-176).  In  the 
groups  studied  here,  there  is  a  direct  restriction  on  the 
predictor  variables  as  a  result  of  the  selection  process. 

39 


The  initial  correlation  coefficients  were  corrected  to  take 
this  attenuation  into  account  using  the  formula  derived  by 
Thorndike  <1949,  p,173>. 

Frequently,  problems  associated  with  small  sample 
instability  have  been  mentioned  as  limiting  factors  in 
cr i ter i on-rel ated  validity  research.  The  range  of 
correlation  coefficients  for  the  master's  programs  studied 
here  was  large,  indicating  that  the  same  problems  may  be 
present.  To  reduce  the  effects  of  small  sample 
instability,  an  effort  was  made  to  determine  whether  or  not 
some  of  the  programs  could  be  combined  to  form  larger,  but 
still  homogeneous,  groups.  A  preliminary  inspection  of 
the  correlation  matrices  showed  that  only  a  few  of  the 
predictor  variables  consistently  correl ated  wi th  66PA  at  a 
.03  significance  level  in  more  than  half  of  the  17 
programs.  The  matrices  were  examined  to  determine  which  of 
the  variables  were  significantly  related  to  the  criterion 
in  the  largest  number  of  programs  with  the  following 
resul ts: 


40 


VTA.’  *. 


Predi ctor 
Variable 

Name 

Number  of 
Programs 

Si  gn  i  f  i  cant 

GRET 

13 

GREG 

13 

UGPA 

10 

GREV 

6 

EYRS 

6 

GREA 

5 

CYRS 

3 

GMTQ 

3 

NMAT 

3 

GMTT 

2 

GMTV 

2 

TOEF 

0 

It  was  decided  to  us*  the  subset  of  predictors 
containing  the  GRET,  GREG,  GREV,  and  UGPA  as  the  basis  for 
comparing  the  predi ctor/cr i ter i on  relationships  between 
programs  due  to  missing  data  among  the  other  predictors. 
Statistically  significant  predi ctor/cr i ter i on  relationships 
were  compared  across  programs  using  the  method  outlined  in 
Cohen  and  Cohen  <1973,  pp. 30-52).  Because  the  sampling 

distribution  of  non-zero  correlation  coefficients  is 

skewed,  it  was  necessary  to  use  Fischer's  Z  Transformation 
to  convert  the  distribution  of  independent  correlation 
coefficients  to  a  nearly  normal  distribution.  The 
transformed  values  were  tested  using  a  procedure  very 
similar  to  a  T-test  (Cohen  and  Cohen,  1975). 

The  observed  significance  levels  (p-values) 

calculated  in  this  process  were  tabulated  in  matrix  form. 
The  table  of  p-values  can  be  found  in  Appendix  B.  Although 

41 


/ 

/ 


the  number  of  possible  program  combinations  was  large,  the 
requirement  that  the  programs  be  similar  with  respect  to  at 
least  two  of  their  predi ctor/cr i terion  correlations 
eliminated  a  great  number  o-f  possible  combinations.  In  the 
end,  -five  homogeneous  program  groups  were  -formed.  In  these 
groups  the  predi  c  tor/cr  i  ter  i  on  relationships  -for  two  or 
more  predictors  were  not  significantly  different  <p  <  .05). 
Correlation  matrices  for  these  five  program  groups,  and  fo* 
the  entire  sample,  can  be  found  in  Appendix  C.  The 
resulting  program  groups  are  reported  in  Chapter  III. 

Oeve 1 oo i na  pred i c t i on  mode  1 s 

Stepwise  multiple  regression  was  used  to  calculate 
prediction  models  for  each  of  the  five  groups.  This 
method  has  the  advantage  of  weighting  each  predictor  in 
direct  proportion  to  its  correlation  with  the  criterion  and 
in  inverse  proportion  to  its  correlation  with  other 
predictors.  The  highest  weight  is  assigned  to  the 
predictor  with  the  highest  validity  and  the  least  overlap 
with  other  predictors  in  the  model.  Since  optimum  weights 
are  developed  for  each  predictor,  the  multiple  correlation 
coefficient  that  results  has  the  highest  validity  that  is 
possible  for  that  set  of  predictors  (Anastasi ,  1976, 
pp .180-183) . 


42 


Sines  some  of  the  independent  variables  used  in  the 
regression  models  were  highly  i ntercorrel ated,  .the 
likelihood  of  mul  t  i -col  i  near  i  ty  inducing  a  blocking  effect 
on  the  introduction  o-f  subsequent  independent  variables 
into  the  model  had  to  be  considered.  *To  prevent  a  variable 
that  was  highly  correlated  with  both  the  dependent  variable 
and  the  other  independent  variables  -from  reducing  the 
overall  multiple  correlation  coefficient,  the  independent 
variables  were  systematically  dropped  from  the  equation. 
This  procedure  has  been  suggested  both  as  a  means  to 
identify  a  mul t i -col i near i ty  problem  if  one  exists,  and  to 
eliminate  its  effects  on  the  calculation  of  a  regression 
equation  (Nie,  et  al ,  1973,  pp. 340-341). 

The  ‘best*  model  for  each  of  the  groups  was  chosen 
based  on  a  comparison  of  multiple  correlation  coefficients. 
These  models  are  reported  in  Chapter  IT I. 


43 


CHAPTER  III 


RESULTS 

This  chapter  contain*  four  *•  tions.  In  section 
on#,  evidence  supporting  th#  validity  of  th#  predictor 
variables  is  presented.  Section  two  contains  a  brief 
anatysis  of  the  validity  of  the  procedure  currently  used  in 
selecting  AFIT  students,  and  reports  the  outcome  of  this 
procedure.  In  the  third  section  the  prediction  models 
developed  in  this  research  project  are  listed,  and  their 
usefulness  is  discussed.  The  fourth  section  is  a  short 
economic  analysis  of  th#  benefits  that  could  result  from 
th#  use  of  the  prediction  models  developed  in  this  study. 


Ual  i  di 


Th#  correlations  between  each  of  the  twelve 


predictor  variables  and  GGPA  are  shown  in  Table  1.  These 
correlation  coefficients  were  computed  for  the  entire 
sample  (N  =*  2170),  but  because  data  on  some  of  the 


variables  were  missing  from  many  of  th#  cases,  the 
individual  correlations  are  based  on  smaller  sample  sizes. 


For  some  of  th*  variables  th*  reduction  in  sample  size  is 


very  large.  A  full  correlation  matrix  can  be  found  in 
Appendix  C. 


Table  1 

Correlation*  of  predictors  with  GGPA  (entire  sample) 


VARIABLE: 

GMTT 

GMTV 

GMTQ 

GRET 

CORRELATION: 

.440 

.465 

.285 

.315 

SAMPLE  SIZE: 

386 

381 

381 

1330 

SIGNIFICANCE: 

0.00 

0.00 

0.00 

0.00 

VARIABLE: 

GREV 

GREQ 

GREA 

EYRS 

CORRELATION: 

.163 

.351 

.401 

-.31 

SAMPLE  SIZE: 

1330 

1330 

456 

342 

SIGNIFICANCE: 

0.00  ' 

0.00 

0.00 

0.00 

VARIABLE: 

CYRS 

NMAT 

TOEF 

UGPA 

CORRELATION: 

.191 

-.05 

.402 

.187 

SAMPLE  SIZE: 

1976 

2090 

28 

2168 

SIGNIFICANCE: 

0.00 

0.03 

0.06 

0.00 

Tabl e  1  show*  that  the  GMAT  tests  and  the  GRE 
Analytical  test  are  correlated  with  GGPA  when  all  AFIT 
master's  degree  programs  are  grouped  together.  In 
addition,  it  shows  that  their  correlations  with  GGPA  are 
stronger  than  those  of  GRET ,  GREW,  and  UGPA  which  are  being 
used  by  the  AFIT  Registrar's  office  as  the  primary 

indicators  for  the  engineering  master's  programs,  and  as 
alternates  for  the  logistics  school  programs. 


45 


Comparing  the  correlations 


The  correlations  reported  in  Table  1  were  based  on 
a  sample  containing  17  different  master's  degree  programs. 
It  is  logical  to  assume  that  the/  represent  a  middle  ground 
between  the  highest  and  lowest  correlations  found  in 
individual  programs.  A  comparison  of  the  correlation 
coefficients  that  were  calculated  for  the  17  master's 
degree  programs  supports  this  h/pothesis.  Substantial 

differences  in  the  relationships  between  the  predictor 
variables  and  GGPA  were  observed,  even  when  programs  that 
appear  to  be  somewhat  similar  on  the  surface  were  compared. 
For  example  correlation  coefficients  for  GREW  with  GGPA 
ranged  from  -.447  <N  ■  115)  in  the  Aeronautical  Engineering 
program  to  .67 4  <N  »  34)  in  the  Systems  Engineering 
program. 

Some  of  the  differences  in  correlations  can  be 
attributed  to  the  instability  of  correlation  coefficients 
in  small  samples,  although  most  of  the  sample  sizes 
reported  here  for  individual  p r oor ams  are  equal  to  or 

larger  than  those  commonly  reported  in  the  literature. 
Sample  correlation  coefficients  for  each  of  the 
predi ctor/cr i ter i on  relationships  were  compared  with  the 
object  of  combining  programs  into  statistically  similar 
groups.  As  a  result  of  this  process  13  of  the  17  programs 
were  combined  into  3  groups.  Members  of  each  of  these 


two  or 


groups  had  correlation  coefficients  for  two  or  more 
predi ctor/cr i ter i on  rel it i onsh i ps  thit  were  not 
signif icintly  different. 

This  process  '  demonstrated  that  some  programs  could 
be  grouped  together  to  reduce  the  effects  of  small  sample 
instability  without  significantly  degrading 
predi ctor/cr i ter i on  relationships  that  were  observed  in.  the 
individual  programs,  and  added  support  to  the  hypothesis 
that  statistical  combination  of  groups  would  reveal 
similarities  not  intuitively  obvious. 


Predictor/criterion  correlations  for  program  groups 

Tables  2  through  6  show  the  correlations  between  the 
relevant  predictors  and  GGPA  for  each  of  these  groups. 
These  correlations  demonstrate  the  validity  of  the 
predi c tor/cr i ter i on  relationships  in  the  program  groups. 
With  the  exceptions  of  GRET,  GREV,  GREQ,  and  UGPA  which  are 
reported  in  every  case  for  purpose  of  comparison, 
predictors  that  did  not  correlate  with  GGPA  at  the  .10 
significance  level  are  not  included  in  the  tables. 


47 


Correlation*  of  predictors  with  GGPA  (Group  #1) 


ASTRONAUT I  CAL  ENGINEERING 
SYSTEMS  MANAGEMENT 
SYSTEMS  ENGINEERING 


VARIABLE: 

GRET 

GREV 

GREQ 

UGPA 

CORRELATION: 

.638 

.538 

.622 

-.05 

SAMPLE  SIZE: 

167 

167 

167 

296 

SIGNIFICANCE: 

0.00 

0.00 

0.00 

0.27 

Table  3 

Correlations  o-f  predictors  with  GGPA  (Group  #2) 


STRATEGY  AND  TACTICS  (O.R.) 
ELECTRICAL  ENGINEERING  OPTICS 
ELECTRICAL  ENGINEERING 


VARIABLE : 

GRET 

GREV 

GREQ 

CORRELATION: 

.308 

.12? 

.367 

SAMPLE  SIZE: 

283 

285 

283 

SIGNIFICANCE: 

0.00 

0.06 

0.00 

VARIABLE: 

GREA 

CYRS 

UGPA 

CORRELATION: 

.163 

.167 

.341 

SAMPLE  SIZE: 

117 

422 

429 

SIGNIFICANCE: 

0.10 

0.00 

0.00 

Tab]  •  4 

Corral  at  ions  o f  predictors  with  GGPA  (Group  43) 


LOGISTICS  MANAGEMENT 
ENGINEERING  MANAGEMENT 
CONTRACTING  MANAGEMENT 
ACQUISITION  MANAGEMENT 


VARIABLE : 

GRET 

GREV 

GREQ 

GREA 

CORRELATIONS 

.372 

.233 

.324 

.531 

SAMPLE  SIZE: 

515 

515 

515 

166 

SIGNIFICANCE: 

0.00 

0.00 

0.00 

0.00 

VARIABLE: 

CYRS 

NMAT 

UGPA 

CORRELATION: 

.139 

.160 

.158 

SAMPLE  SIZE: 

457 

484 

470 

SIGNIFICANCE: 

0.01 

0.01 

0.14 

50 


Table  3 


Correlations  o-f  predictors  with  GGPA  (Group  #4) 


AERONAUT I CAL 

ENGINEERING 

ENGINEERING 

PHYSICS 

OPERATIONS  RESEARCH 

VARIABLES: 

GRET 

GREV 

GREQ 

CORRELATIONS: 

.308 

.12? 

.367 

SAMPLE  SIZE: 

233 

283 

283 

SIGNIFICANCE: 

0.00 

0.06 

0.00 

VARIABLES: 

GREA 

CYRS 

UGPA 

CORRELATIONS: 

.163 

.167 

.341 

SAMPLE  SIZE: 

117 

270 

277 

SIGNIFICANCE: 

0.10 

0.02 

0.00 

31 


I  I 


T*blt  6 

Corr»1 *t i on*  o-f  predictor*  with  GGPA  (Group  #5> 


COMPUTER  SCIENCE 

NUCLEAR  EFFECTS  ENGINEERING 


VARIABLES! 

GRET 

GREV 

GREQ 

UGPA 

CORRELATION! 

.322 

.010 

.492 

.273 

SAMPLE  SIZE.- 

142 

142 

142 

181 

SIGNIFICANCE! 

0.00 

0.00 

0.00 

0.00 

Description  o-f  present  admissions  procedures. 

The  Air  Force  uses  at  three-step  process  in 
screening  potential  students  -for  programs  under  AFIT's 
jurisdiction.  In  the  first  step,  academic  records  are 
reviewed  by  AFIT's  evaluators  and  the  names  of  all 
academically  eligible  officers  are  transmitted  to  the  Air 
Force  Military  Personnel  Center  (MPC) .  AFIT's  academic 
evaluation  is  a  continuous  process.  Since  AFIT  is  the 
repository  for  all  active  duty  Air  Force  officer 
educational  records,  these  records  are  forwarded  to  AFIT 
shortly  after  an  officer  is  commissioned.  When  AFIT 
receives  them,  the  records  are  screened  to  determine 
whether  or  not  the  officer  meets  the  eligibility  criteria 
for  admission  to  the  AFIT  programs  that  are  related  to 
his/her  career  field  or  past  academic  experience. 


Those 

off i cers 

whose 

academic  records 

are 

above 

average  will 

normal  1 y 

be 

classified  as  eligible 

for 

AFIT 

programs  as 

a  resul t 

of 

the 

initial  evaluation. 

In 

th  i  s 

manner,  officers  who  have  not  formally  applied  for 
admission  are  "centrally  identified."  Officers  may  also 
become  eligible  for  AFIT  programs  by  requesting  evaluation 
(volunteering).  AFIT's  position  is  that  volunteers  are 
better  motivated  to  succeed  in  AFIT  graduate  programs. 


53 


These  individuals  need  not  have  about  average  academic 
records,  but  they  must  meet  AFIT's  minimum  criteria.  AFIT 
provides  educational  counseling  to  volunteers  who  do  not 
meet  eligibility  criteria.  If  additional  transcripts 
showing  that  deficiencies  have  been  corrected  are  -forwarded 
to  AFIT  at  a  later  date,  the  officer's  records  are 
re-evaluated  and  eligibility  may  be  granted  at  that  time. 
The  names  of  all  officers  qualified  by  either  of  these 
processes  are  placed  on  an  AFIT  eligibility  listing. 
Updated  versions  of  this  computer  listing  are  transmitted 
to  MPC  periodically  (Bigelow,  1983). 

The  current  listing  shows  that  approx imatel y  13,0C0 
officers  have  attained  eligibility  status  through  the 
processes  described  above  (Air  Force  Institute  of 
Technology,  1983).  This  can  be  contrasted  with  the  nu- ber 
of  Air  Force  officers  who  have  not  yet  earned  a  masters 
degree.  According  to  the  Air  Force  Magazine  (May,  1983), 
there  are  31,190  line  officers  in  this  category.  Of  that 
total,  only  23.3*/.  are  included  in  the  group  that  AFIT 
considers  eligible. 

Although  the  minimum  eligibility  criteria  vary  from 
program  to  program,  in  general  they  consist  of  the 
f ol 1 owi ngi 

1.  Undergraduate  GPA  of  2.3  or  higher 

2.  ORE  verbal  and  quantitative  test  scores 

34 


/ 


totalling  1000.  GMAT  scores  of  500  or  better  are  prefered 
■for  some  programs,  but  GRE  scores  are  acceptable. 

3.  A  minimum  number  o-f  math  courses  (depending  on 
degree*  type) . 

4.  Grades  o-f  "C"  or  better  in  required  courses. 
(U.S.  Air  Force  Manual  50-5,  Volume  I,  para  4-15,  4-16, 
1981) 


In  the  second  step  o-f  the  process,  career  -field 
managers  at  MPC  review  the  military  records  o-f  eligible 
o-f-ficers  under  their  purview  to  determine  which  o-f  them  are 
available  -for  an  assignment,  have  the  required  job 
experience,  and  have  acceptable  performance  ratings.  Once 
this  review  is  completed,  selection  folders  containing  the 
relevant  portions  of  the  academic  and  military  records  of 
the  officers  eligible  for  AFIT  are  prepared  for  review  by 
MPC's  selection  board.  Since  each  of  the  career  field 
managers  acts  independently,  and  has  a  different  quota  to 
fill,  it  is  doubtful  that  this  part  of  the  screening 
process  is  conducted  uniformly.  Minimum  criteria  is 
specified  by  Air  Force  Manual  50-5,  Volume  I,  para  4-15 
(a) : 

a. Mi  1 i ta  -Availability.  Of f i cers  musts 

(1)  _e  medically  unrestricted  for 
worldwide  duty. 

(2)  Be  serving  in  the  grade  of  colonel  or 

below. 

<3)  Have  a  competitive  mi  1 i tary  record. 


<4)  Ba  avail  abla  -for  raassi  gnmant . 

<3>  Hava  at  laast'  3  yaars  intarvaning 
sarvica  sinca  last  PCS  aducation  on  tha  data  o-f 
class  antry.  <Unitad  Statas  Air  Forca,  1981) 

In  addition  to  thosa  raquiramants,  o-f-ficars  must  also  maat 

tha  -following  critaria,  which  ara  among  thosa  spaci-fiad  in 

AFM  30-3,  Voluma  I,  para  13  (c)t 

c.  Assignmant  availability: 

<1)  On-station  raqu i ramants: 

<a>  Normally,  tha  AFIT  antry  data 
providas  -for  a  minimum  of  24  months  on  station 
bafora  school  antry. 

<b>  Officars  sarving  on  or  projactad 
to  sarva  on  ovarsaas  tours  ara  schadulad  for  school 
antry  to  coincida  with  thair  0ER0S.  (Unitad  Statas 
Air  Forca,  1981) 

Tha  final  phasa  of  tha  scraaning  procass  occurs 
whan  tha  off i car's  mi  1 i tary  and  aducational  racords  ara 
avaluated  by  a  sal  act  ion  board  of  sanior  officars. 
According  to  AFM  30-3,  Voluma  I: 

A  continuous  sal  act  ion  board  convanas  bagining 
in  July  <aach  yaar)  to  considar  lina  of  tha  Air 
Forca  applicants  and  cantrally  idantifiad  officars 
balow  tha  rank  of  colonal  for  AFIT  antry  during  tha 
naxt  fiscal  yaar.  Out  of  cycla  sal  act  ions  ara  mada 
throughout  tha  yaar  from  lata  voluntaars  and  PCS 
avai labia  officars  to  fill  any  ramaining 
vacanc i as. <para  4-22  <a),  1981) 

Tha  salaction  procass  is  highly  compatitiva  and 
considars  ovaral 1  acadamic  military  parformanca  and 
post-AFIT  assignmant  suitability.  Factors  includa 
promotabi 1 i ty ,  caraar  prograssion,  prior  acadamic 
and  assignmant  axparianca  and  tha  qualifications  of 
tha  individual  to  parform  in  positions  raquiring 
tha  aducation  to  ba  obtainad  through  AFIT.  Tha 
salaction  procass  is  dasignad  to  salact  officars 
whosa  potantial  contribution  aftar  graduation  will 
most  banafit  tha  Air  Forca.  <para  4-22  <b),  1981) 

This  board  functions  diffarantly  from  a  military  promotion 


36 


board,  and  is  closely  related  to  the  assignment  process. 
MPC's  career  field  managers,  whose  primary  interest  is  in 
the  assignment  process,  have  a  significant  influence  on 
AFIT  selection  board  decisions  (Bigelow,  1983). 

Determining  the  procedure's  validity 

The  number  of  people  involved  in  the  screening 
process  makes  analyzing  the  current  procedures  a  difficult 
task.  For  the  purpose  of  this  thesis,  analyzing  the  result 
of  the  process  js  a  better  starting  point.  If  success  at 
AFIT  is  defined  in  terms  of  graduation  on  time,  the 
data  collected  in  this  study  shows  that  90.4V:  of  those 
selected  for  AFIT  meet  that  criterion.  Of  those  who  did 
not  graduate  on  time,  26.9"/.  eventually  completed  their 
degree  requirements.  In  other  words,  92.99V.  of  those  who 
attended  AFIT  resident  master's  degree  programs  between 
1977  and  1982  (inclusive)  have  completed  graduation 
requirements.  This  is  a  very  respectable  figure,  when 
compared  to  the  graduation  rates  normally  found  in  civilian 
graduate  institutions.  However,  there  are  other  factors 
that  must  be  considered. 

The  selection  ratio  has  a  direct  bearing  on  the 
results  of  a  selection  process.  Duri.ig  the  period  of  1977 
to  1982  inclusive,  an  average  of  362  students  were  selected 
for  AFIT  resident  master's  degree  programs  each  year. 


57 


Assuming  that  the  number  o-f  eligibles  has  remained  fairly 
constant  over  that  period,  a  usefut  estimate  of  the 
salaction  ratio  for  this  tima  pariod  is  362/13000  or  2.7%. 
Tha  actual  salaction  ratio  must  ba  lower  than  2.7%.  because 
that  astimata  includes  only  officers  who  are  eligible  for 
AFIT  <23.3%  of  tha  population).  A  salaction  ratio  of  this 
size  significantly  enhances  tha  accuracy  of  a  salaction 
process.  Given  an  astimata  of  tha  graduation  rate  that 
would  have  occurred  had  no  screening  process  bean  used,  it 
is  possible  to  astimata  tha  validity  of  tha  selection 
process  itself. 

Tha  graduation  rata  that  would  have  occurred  had 
there  bean  no  salaction  was  estimated  at  69%.  This  figure 
assumes  that  essential  unH"  graduate  prerequisite  courses 
or  course  sequences  had  bean  completed  and  that  the 
applicant's  undergraduate  degree  is  in  the  required  field 
to  qualify  him/her  for  graduate  study  in  an  AFIT  resident 
master's  program.  It  does  not  reflect  tha  use  of  cut-off 
scores  for  GRE  or  GMAT  test  or  for  UGFA.  The  method  used 
to  estimate  tha  graduation  rata  is  shown  in  Appendix  D. 

Using  Tayl or-Russel 1  tables,  the  validity  sf  the 
Air  Force's  selection  process  was  estimated  at  .33.  The 
fact  that  this  level  of  validity  can  produce  a  90.4% 
graduation  rate  demonstrates  the  benefits  that  result  from 
use  of  a  very  low  selection  ratio.  Further  examination  of 


38 


the  Tayl or-Russel 1  tables  show*  that  a  selection  model  with 
a  validity  of  .65  or  better  would  increase  the  graduation 
rate  to  99'/..  Relevant  Tayl or-Russe 11  tables  are  contained 
in  Appendix  E. 

In  the  first  phase  of  its  selection  process  the  Air 
Force  uses  mu  1 t i p 1 e-cr i ter i a  cut-off  scores  to  screen 
applicants.  Selection  procedures  of  this  kind  may  be 
useful  when  the  number  of  applicants  is  large,  and  the 
evaluation  methods  are  relatively  inexpensive,  but  they  are 
problematic.  The  effect  of  the  multiple  cut-off  scores  is 
to  eliminate  individuals  from  consideration  based  on 
subjective  criteria  weighting  systems,  rather  than  more 
objective  statistically  derived  formulae.  If  any  of  the 
eligibility  criteria  are  set  too  high,  or  are  irrelevant,  a 
significant  portion  of  potentially  successful  applicants 
can  be  excluded. 

The  large  number  of  missing  values  in  the  data 
indicates  that  the  multiple  cut-off  score  criteria  are  not 
being  applied  uniformly.  An  applicant  who  formally 
requests  that  his/her  eligibility  for  AFIT  programs  be 
evaluated  is  required  to  submit  GRE  or  GMAT  scores.  Other 
officers,  whose  initial  eligibility  was  determined  based  on 
the  other  criteria  <i.e.,  those  that  were  centrally 
selected),  may  be  selected  for  AFIT  without  consideration 
of  GRE/GMAT  scores.  This  situation  occurs  because  “the 


5? 


AFIT  selection  cycle  does  not  always  tie  in  with  the  ETS 
testing  cycle*,  according  to  Hr.  C.  P.  Bigelow,  Chie-f  o-f 
AFIT's  Evaluation  and  Counseling  Section  (1983). 

The  academic  evaluation  o-f  those  o-f-ficers  who  are 
not  volunteers  <who  have  not  forwarded  test  scores  to  AFIT) 
is  based  largely  on  UGPA.  Considering  that  the 
correlations  found  between  UGPA  and  6GPA  in  this  study 
range  from  .IS  to  .34,  predictions  based  on  UGPA  alone  are 
questionable,  especially  when  better  information  is 
available.  This  practice  results  in  the  use  of  a  different 
set  of  predictors  (and  predictor  weights)  for  those  who 
have  furnished  AFIT  with  test  score  data  and  those  who  have 
not.  However  unavoidable  they  may  be,  these  circumstances 
result  in  a  more  stringent  screening  of  volunteers  for  AFIT 
than  of  non-volunteers,  benefiting  the  non-volunteers.  A 
procedure  that  uniformly  applied  standard  cut-off  scores 
for  all  criteria  to  all  app 1 i cants  woul d  at  least  insure 
that  all  applicants  were  considered  on  the  same  basis. 

You  may  recall  the  study  by  Borg  <1963)  that  was 
discussed  earlier.  He  found  that  the  use  of  a  si nol e 

cut-off  score  for  the  GRE  Uerbal  test  would  have  denied 
admission  to  41  successful  students  as  well  as  21 
unsuccessful  students.  Since  this  process  would  have 
eliminated  nearly  twice  as  many  successful  students  as 
unsuccessful  students  from  consideration,  he  recommended 


60 


Because  of  their  c 


against  its  use.  Because  of  their  cumulative  effects,  the 
Air  Force's  use  of  multiple  cut-off  scores  may  be  producing 
even  more  undesirable  effects.  Determining  the  extent  of 
these  effects  would  be  extremely  difficult  because  the  use 
of  multiple  cut-off  scores  results  in  relationships  that 
are  non-linear.  Predictions  based  on  multiple  cut-off 
criteria  involve  complicated  mathematics  and  can  only  be 
made  for  a  given  set  of  scores. 


Best  prediction  models 

Prediction  models  were  developed  using  a  step-wise 
linear  regression  program.  A  series  of  regression  models 
were  calculated.  To  insure  that  the  best  combination  of 
variables  was  used,  each  of  the  predictor  variables  was 
dropped  from  the  equation  in  turn.  In  most  cases,  at  least 
one  variable  was  dropped  from  the  regression  model  before 
the  highest  multiple  R  was  achieved.  The  "best  models" 

shown  below  were  chosen  on  the  basis  of  a  comparison  of 

* 

multiple  R's. 


61 


Table  7 


Multiple  regression  equation 
(entire  sample  using  cases  with  both  GRE  and  GMAT> 


PREDICTOR 


WEIGHT 


GMTT 

GRET 

GREU 

GMTQ 

UGPA 

CONSTANT 


+0.00279892? 
+0.002024500 
-0.002382224 
-0 .026734060 
-0.081269790 
+1 .999044000 


MULTIPLE  R 
SAMPLE  SIZE 


0.51692 

108 


Table  8 

Multiple  regression  equation 
(entire  sample  using  cases  with  QiAT) 


PREDICTOR 


WEIGHT 


GMTV 

GMTQ 

UGPA 

CONSTANT 


+0.021495910 

+0.005746557 

+0.035720140 

+2.601640000 


MULTIPLE  R 
SAMPLE  SIZE 


0.4780? 

364 


62 


Table  9 


Multiple  regression  equation 
(entire  sample  using  cases  with  GRE) 


PREDICTOR 

WEIGHT 

GREA 

+0.001622894 

UGPA 

+0.139886900 

GREV 

'-0.001924470 

GRET 

+0.001060860 

CONSTANT 

+1 .941677000 

MULTIPLE  R  * 

0.49388 

SAMPLE  SIZE  » 

419 

Table  10 

Multiple  regression  equation  (Group  #1) 


ASTRONAUT I  CAL  ENGINEERING 
SYSTEMS  MANAGEMENT 
SYSTEMS  ENGINEERING 


PREDICTOR 


WEIGHT 


GRET 

UGPA 

GREQ 

GREV 

CONSTANT 


-0.004151110 
-0 .255726700 
+0.007465280 
+0.005263321 
+1 .364894000 


MULTIPLE  R  ■  0.71036 

SAMPLE  SIZE  «  161 


63 


TABLE  1 1 

Mult ipl*  regression  equation  (Group  #2) 


STRATEGY  AND  TACTICS  (O.R.) 
ELECTRICAL  ENG  (OPTICS) 
ELECTRICAL  ENGINEERING 


PREDICTOR  WEIGHT 


GREQ 

UGPA 

CYRS 

GREA 

GREY 

CONSTANT 


+0.001219023 

+0.421528200 

♦0.045127900 

♦0.001149079 

-0.001095729 

+0.981167700 


MULTIPLE  R  - 
SAMPLE  SIZE  - 


0.5762 

117 


Table  12 


Multiple  regression  equation  (Group  #3) 


LOGISTICS  MANAGEMENT 
ENGINEERING  MANAGEMENT 
CONTRACTING  MANAGEMENT 
ACQUISITION  MANAGEMENT 


PREDICTOR  WEIGHT 


GMTV 

+0.008191018 

GMTQ 

+0.008754303 

UGPA 

♦0.225509900 

CYRS 

+0.026706010 

NMAT 

+0.036887990 

CONSTANT 

♦2.122747000 

MULTIPLE  R  - 

0.55204 

SAMPLE  SIZE  - 

187 

65 


Table  13 


Multiple  regression  equation  (Group  #4) 


AERONAUTICAL  ENGINEERING 

ENGINEERING 

PHYSICS 

OPERATIONS 

RESEARCH 

PREDICTOR 

WEIGHT 

UGPA 

♦0.477362800 

GREQ 

-0.005689860 

GREV 

♦0.010004080 

GRET 

+0.008815480 

CONSTANT 

♦0.404797700 

MULTIPLE  R  - 

0.69005 

SAMPLE  SIZE  » 

245 

Table  14 


Multiple  regression 

equation  (Group  #5> 

COMPUTER 

SCIENCE 

NUCLEAR 

EFFECTS 

PREDICTOR 

WEIGHT 

GRET 

♦0.003032275 

GREV 

-0.004153440 

UGPA 

+0.175798500 

CONSTANT 

+1 .495748000 

MULTIPLE  R  = 

0.64412 

SAMPLE  SIZE  = 

133 

Group  #3  contains  all  the  programs  in  the  School  of 
Systems  and  Logistics,  with  the  exception  o-f  Systems 
Management.  It  was  the  only  group  in  which  there  were 
enough  cases  with  GMAT  scores  to  permit  a  comparison  o-f 
models  based  on  GRE  and  GMAT.  The  model  based  on  cases 
with  GMAT  scores  was  the  better  o-f  the  two.  <In  the  GRE 
based  mode  1 ,  mu  1 1 i p 1 e  R  *  . 49 ,  N  *  1 27 . > 

GREA  is  a  relatively  new  subtest  o-f  the  GRE. 
Because  it  is  new,  this  variable  was  missing  -from  many 
cases.  Inclusion  o-f  it  in  a  model  reduced  the  sample  size. 
However,  GREA's  correlation  with  GGPA  was  generally  one  o-f 
the  highest  for  each  of  the  groups.  For  this  reason,  and 
the  need  to  establish  its  contribution  to  prediction  in 
the  various  programs,  it  was  included  in  as  many  prediction 
models  as  possible.  In  every  case,  it  increased  the 

multiple  correlation  coefficients  over  those  found  without 
i  t . 

These  prediction  models  confirm  the  findings 
reported  in  the  first  section  of  this  chapter.  That  is, 
they  show  that  each  of  the  program  groups  has  i ts  own 
unique  “best*  set  of  predictors  and  predictor  weights. 
Furthermore,  the  different  weights  the  variables  take  on  in 
the  linear  models  demonstrate  the  importance  of  using 
statistical  means  to  establish  a  selection  formula. 


67 


Economic  analysis 


In  Chapter  I  the  costs  assoc i ated  wi th  sponsoring  a 
student  in  AFIT  resident  master's  degree  programs  wara 
mantionad  as  justification  -for  this  rasaarch.  For 
convenience,  tha  figures  ara  rapaatad  hara. 

Enginaaring  school  cost  ■  *82,892.48  par  studant 
Logistics  school  cost  *  *47,258.44  par  studant 

In  this  study  it  was  datarminad  that  145  Enginaaring  School 
studants  and  43  Logistics  School  studants  -failad  to 
graduata  with  thair  classmates.  If  you  assuma  that  aach 
non-graduate  <or  lata  graduata)  raprasants  a  total  loss  on 
tha  Air  Force's  invastmant,  tha  cost  of  salaction  arrors 
can  ba  datarminad  aasily. 

145  x  *82,892.48  -  *12,019,439.00 
43  x  *47,258.44  -  *  4,237,295.40 

Total  «  *14,254,734.40 

Tha  assumption  that  a  non-graduata  raprasants  a  total  loss 
on  tha  invastmant,  saams  mora  raasonabl a  whan  you  consider 
that  a  studant  could  have  been  selactad  who  would  have 
graduated. 

In  tha  previous  section  which  examined  tha  validity 
of  currant  salaction  procedures,  it  was  noted  that  a 
prediction  modal  with  a  validity  of  .45  or  batter  would 


48 


increase  the  graduation  rate  from  90.4%  to  99%.  The 
validities  of  the  models  developed  for  the  five  program 
groups  in  this  study  range  from  .35  to  .71.  The  uniform 
application  of  these  models  should  increase  the  graduation 
rate  to  between  97%  and  100%.  When  you  consider  that  the 
Air  Force's  loss  through  incorrect  admissions  decisions 
averaged  more  than  *2.7  million  per  year  over  the  six  years 
included  in  this  study,  investing  a  fraction  of  that  amount 
to  implement  a  new  selection  strategy  makes  good  sense. 


69 


CHAPTER  IV 


DISCUSSION  AND  CONCLUSIONS 

A  review  of  the  hypotheses 

The  -first  hypothesis  stated  that  statistically 
significant  differences  in  predi ctor/cr i ter i on  correlations 
would  be  found  when  the  correlations  were  compared  across 
AFIT  programs.  The  correlations  between  the  predictor 
variables  and  GGPA  varied  significantly  between  AFIT 
masters  degree  programs,  adding  support  to  the  findings  of 
Madaus  and  Walsh  <1963).  Furthermore,  for  many  AFIT 
master's  degree  programs,  the  predi ctor/GGPA  correlations 
were  not  significant  at  the  .03  or  even  .1  significance 
level.  This  finding  was  unexpected,  and  indicates  that  the 
use  of  variables  that  appear  to  be  logically  related  to  the 
criterion  is  unsound.  Until  the  validity  of  a  predictor  is 
demonstrated  statistically  it  should  not  be  used. 

The  second  hypothesis  is  related  to  the  first.  It 
stated  that  correlation  coefficients  for  the  entire  sample 
would  be  lower  than  some  of  those  computed  for  individual 
programs.  It  was  also  supported. 


70 


The  third  hypothesis,  that  the  regression  models 
developed  -for  statistically  combined  program  groups  would 
di-f-fer  in  terms  o-f  their  predictors  and  predictor  weights 
was  supported. 

The  -fourth  hypothesis,  that  GRE  scores,  GMAT 
scores,  and  UGPA  are  valid  predictors  o-f  GGPA  can  only  be 
supported  for  sane  of  the  programs  studied.  For  example, 

i  i 

the  correlation  of  GREV  with  GGPA  is  statistically 
significant  at  the  .05  level  in  only  6  of  the  17  master's 
programs.  Even  when  GREU  is  a  stat i st i cal  1 y  significant 
predictor  of  GGPA  the  correlations  differ  widely  from 
program  to  program.  The  correlations  for  GREV  with  GGPA 
range  from  -.447  in  the  Aeronautical  Engineering  program  to 
.474  in  the  System-  Engineering  program,  but  the 
correlation  for  the  entire  sample  was  .163.  Assuming  that 
the  correlation  is  the  same  for  all  three  groups  is  a 
serious  error.  The  other  predictors  followed  the  same 
pattern’  as  GREV,  though  not  to  the  same  extreme.  These 
predictors  should  be  used  only  for  specific  situations  in 
which  their  correlations  with  the  criterion  are  known. 

The  fifth  hypothesis,  that  background  variables 
would  add  to  the  accuracy  of  at  least  one  of  the  prediction 
models  was  supported.  CYRS  entered  two  of  the  final 
prediction  models  and  NMAT  entered  one. 


Tht  last  hypothesis,  that  the  models  developed  in 
this  study  are  more  accurate  than  AFIT's  current  selection 
procedure  was  also  supported.  This  -finding  was  expected. 
The  1 i terature  contains  a  great  deal  of  support  for  the  us* 
of  statistical  procedures  in  solving  problems  of  this  Kind, 
and  it  offers  very  little  support  for  the  use  of  judgement 
or  i ntu i t i on . 

Discuss i on 

This  study  demonstrates  the  concept  of  differential 
validity.  The  correlation  coefficients  calculated  show 
that  the  differences  between  programs  in  a  single  graduate 
school  can  be  significant.  The  prediction  models  developed 
through  multiple  regression  add  additional  support  to  that 
finding,  and  show  that  different  sets  of  predictors  are 
appropriate  for  different  programs.  it  is  evident  from  the 
range  of  correlations  calculated  for  the  various  programs 
that  success  in  some  programs  can  be  predicted  mere 
accurately  than  others. 

The  use  of  statistical  procedures  to  compare  the 
relationships  between  predictors  and  GGPA  within  the  17 
programs  showed  that  the  differences  between  some  groups 
for  as  many  as  three  pred i c tor/cr i ter i on  relationships  were 
not  statistically  significant.  The  benefits  of  grouping 
programs  in  this  manner,  rather  than  through  clinical 


72 


inference,  are  demonstrated  by  th*  relatively  high  multipit 
correlation  coeff  i  c  i  ents  that  went  achieved  for  tht  grouped 
program*.  Thi*  technique  holds  promise  for  many  situations 
in  which  large  individual  samples  do  not  exist.  As  far  as 
can  be  determined,  this  is  the  first  validity  study  to 
combine  groups  statistically  for  prediction  of  success  in 
graduate  school . 

Other  findings 

Some  interesting  variables  were  examined.  These 
include  commissioned  years  (CYRS),  and  enlisted  years 
<EYRS) .  CYRS  provided  low  but  statistically  significant 
correlations  in  5  of  th*  academic  programs.  EYRS  was 
statistically  significant  only  in  the  6  of  the  17  academic 
programs  where  the  proportion  of  officers  with  pr i or 
enlisted  service  time  was  fairly  large.  In  5  of  these 
programs  its  correlations  with  GGPA  ranged  from  -.51  to 
-.75,  indicating  that  officers  with  enlisted  experience  may 
be  at  a  substantial  disadvantage  in  graduate  school.  Where 
the  numbers  are  great  enough  for  it  to  assume  significance, 
this  variable  could  be  very  useful. 

The  effects  of  moderator  variables  were 
investigated  early  in  this  study.  This  line  of  research 
was  dropped  because  the  key  predictors  had  a  large  number 
of  missing  values  and  selecting  cases  based  on  moderator 


73 


variables  drastically  reduced  sample  sizes.  However,  some 
interesting  effects  were  noted.  The  relationships  between 
the  predictors  and  GGPA  were  stronger  for  service  academy 
graduates  than  for  those  who  obtained  undergraduate  degrees 
■from  other  sources.  The  performance  of  Second  Lieutenants 
in  engineering  programs  was  well  below  average  performance 
in  those  programs.  Predictor/criterion  correlations  for 
married  officers  were  higher  than  for  single  officers  in 
most  of  the  programs.  While  moderator  variables  were  not 
especially  useful  in  this  study,  these  findings  indicate 
that  there  may  be  a  great  many  variables  that  are  useful  in 
predicting  performance. 

Concl usi ons 

AFIT's  present  selection  accuracy  is  better  than 
what  could  be  expected  at  a  private  university.  The 
validity  study  described  in  this  thesis  relied  on  well 
established  psychological  measurement  techniques,  but  it. 
combined  them  in  a  new  way.  As  a  result,  it  has  shown  that 
selecting  students  through  these  methods  could  result  in 
even  better  selection  accuracy  than  presently  exists. 

Selecting  students  for  graduate  school  is  no  simple 
task.  The  relationship  between  success  and  past 
performance  varies  from  one  situation  to  another.  This 
study  has  demonstrated  that  variability  exists  between 


74 


correlates  of  success  in  resident  master's  degree  programs 
at  the  Air  Force  Institute  of  Technology.  It  has 
established  the  validity  of  current  selection  procedures, 
of  five  proposed  selection  models,  and  of  several  predictor 
variables.  Since  the  predictor  data  are  already  contained 
in  the  academic  or  mi  1 i tary  records  of  potential  students, 
it  offers  the  Air  Force  some  new  toe  to  aid  the  selection 
process.  More  importantly  it  has  shown  that  a  selection 
procedure  that  uses  multiple  cut-off  scores  only  for 
absolutely  essential  prerequ  1-si  tes,  and  uses  a  linear  model 
incorporating  other  relevant  variables  to  predict 
performance  in  the  criterion  would  result  in  improved 
selection. 

The  structure  of  the  Air  Force  personnel  assignment 
system  and  the  dual  procedure  for  determining  eligibility 
for  AFIT  programs  complicate  the  selection  process.  The 
concept  of  selecting  those  best  qualified  for  graduate 
education  is  certainly  appropriate,  but  it  may  be  difficult 
in  this  environment.  Because  the  selection  system  is  a 
sub-set  of  the  assignment  system,  some  compromises  are 
probably  necessary.  Early  notification  of  eligibles, 
including  those  centrally  identified,  and  the  requirement 
that  al 1  these  officers  submit  test  scores  before 
receiving  an  assignment  to  an  AFIT  graduate  program  would 
improve  the  process. 


73 


The  School 


of  Systems  and  Logistics  has  recently 
begun  emphasizing  the  requirement  that  all  applicants 
submit  test  scores.  This  effort  primarily  influences  the 
1984  class  and  subsequent  classes.  It  is  a  step  in  the 
right  direction.  Models  such  as  those  developed  in  this 
study  are  effective  and  relatively  easy  to  use  if  all  the 
data  are  available.  If  data  are  not  available  the 
practical  benefits  they  offer  are  limited. 

This  study  points  to  a  larger  issue.  That  issue  is 
the  human  cost  involved  in  selecting  some  applicants  and 
rejecting  others.  The  cost  of  choosing  someone  who  will 
eventually  fail  is  high  for  that  individual.  By  the  same 
token,  the  cost  of  rejecting  someone  who  could  have 
suceeded  is  large.  In  many  cases  limited  resources, 
differences  in  ability,  and  external  constraints  make  this 
cost  unavoidable,  but  it  should  be  minimized  whenever 
possible.  It  may  be  difficult  to  translate  into  dollars 
and  cents,  but  it  is  real. 


76 


APPENDIX  A 

DEMOGRAPHIC  INFORMATION 


77 


6REQ  SCORE  DISTRIB 
'1377  •  1382) 


663.8 

8S.1 

1338 


I 


2LT  1LT  OPT  NAJ  LTC  COL  CIV 


(TOTALS  INCLUDE  FOREIGN  STUDENTS) 

|  NUM5ER 


TABLE  OF  P  VALUES 

CALCULATED  IN  GRET/GGPA  CORRELATION  TEST 
BETWEEN  PROGRAM  CORRELATION  COEFFICIENTS 
FOR  GRET  WITH  GGPA 


.146 


Strat  I  Con tr 


Astro 

1  .414  1 

.260 

1 

.039 

1 

.010  1 

.942  1 

.031 

Comp 

1  .036  1- 

.826 

1 

.458 

1 

.180  1 

.147  1 

.513 

EEng 

1  .003  1 

.271 

1 

.472 

1 

.689  1 

.009  1 

.500 

GEO 

1  .239  1 

.860 

1 

.647 

1 

.312  1 

.238  I 

.653 

GEP 

1  .007  1 

.323 

1 

.926 

1 

.773  1 

.028  1 

.395 

Nuc  1 

1  .073  1 

.754 

1 

.689 

1 

.459  1 

.234  1 

.834 

OpsR 

1  .003  1 

.270 

1 

.834 

1 

.881  1 

.018  1 

.454 

SysE 

1  1 

.093 

1 

.013 

1 

.000  i 

.363  I 

.007 

Strat 

1  1 

1 

.472 

1 

.197  1 

.320  1 

.470 

Con  tr 

1  1 

1 

1 

.723  1 

.059  1 

.734 

EMgt 

1  1 

I 

1 

» 

.009  1 

.  33  6 

Sys  Mgt 

1  1 

1 

1 

1 

1 

.028 

If  P 

<  Alpha, 

reject 

the  hypothesis  that  the 

two 

programs  are 

•from 

tha 

same 

popul at i on 

of  students 

TABLE  OF  P  VALUES 

CALCULATED  IN  GREV/GGPA  CORRELATION  TEST 
BETWEEN  PROGRAM  CORRELATION  COEFFICIENTS 
FOR  GREV  WITH  GGPA 


Aero 

Nuc  1 

1  SysEno 

SvsMot 

LOG 

Astro 

.000 

.339 

1 

1  .153 

.165 

.357 

Aero 

1 

.000 

I  .000 

.000 

.000 

Nucl 

1 

I  .263 

.317 

.395 

SysEng 

1 

1 

.764 

.014 

SysMgt 

.003 

I-f  P  <  Alpha,  reject  the  hypothesis  that  the  two 
programs  are  -from  the  same  population  o-f  students 


TABLE  OF  P  VALUES 

CALCULATED  IN  GREQ/GGPA  CORRELATION  TEST 
BETWEEN  PROGRAM  CORRELATION  COEFFICIENTS 
FOR  GREQ  WITH  GGPA 


I  Aer 


o  1 

C 

1  EEn 

Astro 

Aero 

Comp 

EEng 

GEO 

GEP 


.035  I  .28?  I  .002 

1  .213  I  .368 

I  I  .023 


Astro 

.636 

1 

.212 

1 

.051 

1 

.003 

1 

.004 

1 

.303 

Aero 

.21? 

1 

.373 

i 

.833 

1 

.262 

1 

.510 

1 

.308 

Comp 

.736 

1 

.681 

i 

.236 

1 

.021 

1 

.035 

1 

.944 

EEng 

.061 

1 

.207 

1 

.66  0 

1 

.702 

1 

.729 

1 

.060 

GEO 

.317 

1 

.664 

1 

.853 

1 

.395 

1 

.646 

1 

.441 

GEP 

.073 

1 

.617 

1 

.617 

1 

.861 

1 

.682 

1 

.370 

OpsR 

.031 

1 

.163 

1 

.496 

1 

.976 

1 

.509 

1 

.057 

SysE 

1 

.513 

1 

.211 

1 

.047 

1 

.084 

1 

.681 

Strat 

1 

1 

.513 

1 

.153 

1 

.276 

1 

.74? 

Con  tr 

1 

1 

.495 

1 

.803 

.303 

EMgt 

1 

1 

1 

1 

.496 

1 

.048 

Log 

1 

1 

1 

1 

.068 

If  P  <  Alpha,  reject  the  hypothesis  that  the  two 
programs  are  from  the  same  population  of  students 


TABLE  OF  P  VALUES 


CALCULATED  IN  UGPA/GGPA  CORRELATION  TEST 


BETWEEN 

PROGRAM 

FOR 

CORRELATION  COEFFICIENTS 

UGPA  WITH  GGPA 

1 

Aero 

1 

ComD 

1 

■ 

GEP  1 

Nuc  1 

Astro 

i 

! 

.972 

1 

1 

.822 

1  .410 

1 

I 

1 

'.210  1 

.739 

Aero 

1 

1 

.749 

1  .318 

1 

.124  1 

.741 

ComD 

1 

1 

9E£TiK 

1  .478 

1 

.230  I 

.920 

EEng 

1 

1 

1  .478 

i 

.204  1 

.960 

GEO 

1 

1 

1 

.719  1 

.660 

GEP 

1 

I 

1 

- 

i  - 

1 

1 

.441 

1 

OpsR 

J_ 

Strat 

1 

LOG 

1 

Astro 

1 

.481 

1 

.456 

1 

.744 

Aero 

1 

.339 

.  366 

1 

.698 

Comp 

i 

.582 

1 

.535 

1 

.457 

EEng 

! 

.589 

1 

.542 

1 

.265 

GEO 

1 

.849 

1 

.936 

1 

.166 

GEP 

1 

.555 

1 

.654 

1 

.038 

OpsR 

! 

1 

.916 

i 

.196 

Strat 

1 

1 

.198 

If  P  <  Alpha,  reject  the  hypothesis  that  the  two 
programs  are  from  the  same  population  of  students 


1 


MATRIX  OF  CORRELATION  COEFFICIENTS 

ENTIRE  AFIT  SAMPLE 

GMTT 

GMTV 

GMTQ 

GRET 

GMTT 

1 

1  1.000 

1  .918 

1 

1 

.863 

1 

1  .803 

GMTV 

1 

1  1 .000 

1 

.543 

1  .655 

GMTQ 

1 

1 

1  .000 

I  .768 

GRET 

1 

1 

1 

1  1.000 

1 

GREV 

GREQ 

UGPA 

GGPA 

GMTT 

1 

1  .688 

I  .662 

1 

i 

.298 

I 

1  .434 

GMTV 

1  .664 

I  .381 

1 

.238 

1  .400 

GMTQ 

1  .501 

1  .792 

1 

.273 

1  .366 

GRET 

1  .870 

1  .843 

1 

.316 

1  .396 

GREV 

!  1.000 

1  .425 

1 

.128 

1  .262 

GREQ 

1 

1  1.000 

1 

.432 

1  .364 

UGPA 

1 

1 

1.000 

1  .145 

GGPA 

1 

1 

1 

1  1.000 

1 

MATRIX  OF  CORRELATION  COEFFICIENTS 

_ GROUP  #1 _ 

ASlfcONAUt  I  CAL  ENG  I  NIGER  INS 
SYSTEMS  ENGINEERING 
SYSTEMS  MANAGEMENT 


GRET 

GREV 

GREQ 

UGPA 

GGPA 

GRET 

1 

1  1.000 

1 

1 

.859 

.871 

.215 

.658 

GREV 

1 

1 

1  .000 

.505 

.087 

.538 

GREQ 

1 

1 

1  .000 

.257 

.622 

UGPA 

1 

1 

1  .000 

-.052 

GGPA 

1 

_ 1 _ 

! 

1  .000 

93 


MATRIX  OF  CORRELATION  COEFFICIENTS 


■5TE3TKV"  AND  TaCTTCS'  nJXT 

ELECTRICAL  ENGINEERING  OPTICS 
ELECTRICAL  ENGINEERING 


GRET 

GREV 

GREQ 

GREA 

CYRS 

UGPA 

GGPA 

1 

GRETI 1 .000 

1 

1  .878 

1 

1  . 863 

1  .634 

1  .067 

1  .212 

1  .308 

GREVI 

1 1 .000 

1  .SOI 

1  .461 

1  .046 

I  .140 

1  .129 

GREQI 

1 

1 1 .000 

1  .472 

1  .037 

1  .216 

1  .367 

GREAI 

1 

1 

1 1 .000 

1  -.208 

f  .033 

1  .163 

CYRSI 

1 

1 

1 1 .000 

1 -.374 

1  .167 

UGPAI 

1 

1 

11.000 

1  .341 

GGPAI 

1 

1 

1 

1 

1 

1 1 .000 

MATRIX  OF  CORRELATION  COEFFICIENTS 
GROUP  #3 

- L’OSI  STI  C'S  MANAGEMENT - 

ENGINEERING  MANAGEMENT 
CONTRACTING  MANAGEMENT 
ACQUISITION  MANAGEMENT 


GRET 

GREV 

GREQ 

GREA 

GRET 

1  1.000 

1  .827 

1 

l  .847 

1 

1 

.693 

GREU 

1 

1  1.000 

1  .333 

1 

.636 

GREQ 

1  1.000 

1 

.465 

GREA 

1 

1 

1 

1 

1 

1  .000 

CYRS 

NMAT 

mmm 

Bi 

GRET 

I  .018 

1  .333 

1  .102 

1 

1 

.372 

GREY 

1  .024 

1  .080 

1  .158 

1 

.23  ' 

GREQ 

1  .012 

1  .434 

1  -.011 

1 

.374 

GREA 

i  .107 

i  .108 

1  .142 

1 

.531 

CYRS 

1  1.000 

1  .044 

1  -.514 

1 

.139 

NMAT 

1 

1  1.000 

1  -.240 

1 

.160 

UGPA 

1 

1  1.000 

1 

.158 

GGPA 

1 

1 

J _ 

1 

l 

t 

i 

1 .000 

MATRIX  OF  CORRELATION  COEFFICIENTS 


GROUP  #4 


CAL  ENG IN 
ENGINEERING  PHYSICS 
OPERATIONS  RESEARCH 


GRET 


1  .000 


GREY 


.88? 
1  .000 


GREQ 


.700 
.313 
1  .000 


UGPA 


.274 
.238 
.216 
1  .000 


MATRIX  OF  CORRELATION  COEFFICIENTS 


GROUP  #5 

CqmFuT£r  Science 

NUCLEAR  ENGINEERING 


GREQ 


UGPA 


GRET 

1.000 

.856 

.858 

.230 

GREY 

1.000 

.439 

.15? 

GREQ 

1  .000 

.215 

UGPA 

1 .000 

GGPA 

■  III  ! I  I  'III 


APPENDIX  D 

METHOD  USED  TO  ESTIMATE  THE  VALIDITY 
OF  CURRENT  AFIT  SELECTION  PROCEDURES 


98 


METHOD  USED  TO  ESTIMATE  THE  VALIDITY 
OF  CURRENT  AFIT  SELECTION  PROCEDURES 

Mean  GREV  and  GREQ  scores  -for  the  unrestricted 
group  were  obtained  -from  an  Educational  Testing  Service 
report  furnished  to  AFIT  (Educational  Testing  Service, 
1981).  The  means  were  calculated  using  all  scores  reported 
to  AFIT  between  October,  1980  and  October ,• 1981 .  They  were 
based  on  data  from  non-selectees  as  well  as  selectees.  The 
ratio  of  the  scores  from  this  unrestricted  group  ta  those 
of  the  students  selected  for  AFIT  (the  restricted  group) 
provided  an  index  that  was  used  to  estimate  what  the  mean 
GGPA  would  have  been  had  all  applicants  that  met  essential 
criteria  been  accepted. 

The  mean  (unrestr i cted)  GGPA  was  estimated  by 
multiplying  the  AFIT  group  GGPA  by  both  of  these  indexes, 
summing  the  products,  and  dividing  by  2.  This  method  was 
used  to  insure  that  the  estimate  would  be  conservative. 

This  figure  was  converted  to  a  2  score  by 
subtracting  the  critical  GGPA  (3.0)  and  dividing  by  the 
unrestricted  standard  deviation,  which  was  calculated  in 
the  same  manner.  The  Z  score  was  then  converted  into  a 
corresponding  area  of  the  normal  curve.  This  area  of  the 
normal  curve  (.19)  was  added  to  the  area  on  the  other  side 


of  the  normal  curve  (0.5).  The  result  is  the  estimate  of 


the  percentage  of  (unrestricted)  students  that  would  have 
earned  a  GGPA  o-f  3.0  or  better  (69%). 

With  this  information,  and  the  selection  ratio,  the 
Tayl or-Russel 1  tables  in  Appendix  E  can  be  used  to  estimate 
the  validity  of  AFIT's  current  selection  procedures.  The 
table  for  .70  shows  that  with  a  selection  ratio  of  .05,  the 
validity  of  the  current  procedures  must  fall  between  .30 
and  .35. 


COMPUTATIONS 


'  StfP.l 


Ratio  #1 

m 

Unrestr i cted 

_Grouo 

GREV  Mean 

Score 

AFIT  Student 

Group 

GREV 

Mean 

Score 

m 

520 

.976 

532.5 

Ratio  #2 

m 

Unrestr i c  ted 

Group 

GREQ 

Mean 

Score 

AFIT  Student 

Group 

GREQ 

Mean 

Score 

m 

609  - 

.917 

663.8 

5±£JL_2 _ 

(Ratio  #1)  X  (AFIT  Mean  GGPA)  » 
(.976)  X  (3.4793)  -  3.3957 
(Ratio  #2)  X  (AFIT  Mean  GGPA)  « 
(.917)  X  (3.4793)  ■  3.1905 


3.3957  ♦  3.1905  «  6.5842 


<6.5862)  X  (0.5)  -  3.293 

3.293  ■  Estimated  Mean  GGPA  for  an 

Unrestricted  Group  of  Students 


Step  4  _ _ _ 

Estimated  Mean  GGPA  -  Pass  Fail  Score  *  2  scor 
Unrestricted  GGPA  Standard  Deviation 

_  3.293  -  3.0  -  .4970  <Z> 

.5895 

.4970  (Z)  ■  area  of  the  normal  curve  *  .19 

.5  +  .19  ■  .69 

.69  *  the  percentage  of  unrestricted  applicants 

who  could  be  expected  to  pass  given  that 
multiple  cut-off  scores  were  not  used 
except  to  establish  that  absolutely 
essential  prerequ i s i tes  had  been  satsified 


Proportion  of  Employees  Considered  Satisfactory 
Selection  Ratio 


10  .20  .  30  .  40 


.00 

.05 

1* 

.10 

*1 

.15 

jg 

.20 

| 

.25 

4 

.30 

4 

.35 

J  * 

.40 

[i 

.45 

.50 

.55 

m 

.60 

I 

.55 

■ 

.70 

.75 

.SO 

; 

.85 

"«■ 

.90 

.95 

1.00 

Proportion  or  Employees  Considered  Satisfactory 
Selection  Ratio 


A 


REFERENCES  CITED 


Air  Fore#  Institute  of  Technology.  AFIT  Education 

Newsletter,  Air  Force  Institute  o-f  Techno!  ogy/RR, 
AURP  53-2,  Wri ght-Patterson  AFB  OH,  1982. 

Air  Force  Institute  of  Technology.  AFIT  Electees/Tentative 
Selectees  and  EUJI  Eligibles  <WYA3> ,  Unpublished 
computer  listing,  Air  Force  Institute  of 
Technol ogy/RR,  Wr i gh t-Pat terson  AFB  OH,  10  July 
1983. 

Air  Force  Institute  of  Technology.  Financial  Report  096CR, 
Unpublished  funds  report,  Air  Force  Institute  of 
Technol ogy/ACB,  Wr i gh t-Patterson  AFB  OH,  1981. 

Air  Force  Magazine,  An  Air  Force  almanacs  the  United  States 
Air  Force  in  facts  and  figures,  Washington  DC:  The 
Air  Force  Association  ,  May  1983,  169. 

American  Psychological  Association,  ~  ~Nrds  for 

educational  and  psychol  ooi  cal  ihu  and  manuals  , 
Washington  DC:  American  Psychological  Association, 
1966. 

Anastas i ,  A.,  Psychological  Testino  ,  New  YorK:  Mac  Mil lan, 
1976. 

Baird,  L.  L.,  Comparative  prediction  of  first-year  graduate 
and  professional  school  grades  in  six  fields, 
Educational  and  Psychological  Measurement  ,  1 975 , 
35,  941-946. 

Bigelow,  C.  P. ,  Chief,  Evaluation  and  Counseling  Section, 
AFIT/RR,  Wr i ght-Patterson  AFB  OH.  Personal 
Interview.  16  August  1983. 

8org,  W.  R.,  GRE  aptitude  scores  as  predictors  of  GPA  for 
graduate  students  in  education,  Educational  and 
Psychological  Measurement  ,  1963,  23,  <2>,  379-382. 


Breaugh,  J.  A.  &  Mann,  R.  3.,  The  utility  of  discriminant 

analysis  for  predicting  graduation  from  a  master  of 
business  administration  program,  Educational  and 
Psychol oo i cal  Measurement  ,  1981,  41,  493-30 1. 

Brogden,  H.  E.,  On  the  i n terpretat i on  of  the  correlation 

coefficient  as  a  measure  of  predictive  efficiency, 
The  Journal  of  Educational  Psychology  ,  1 946 ,  3? , 
<2>,  65-76. 

Camp,  J.  it  Clawson,  T.,  The  relationship  between  the 

graduate  record  examinations  aptitude  test  and 
grade  point  average  in  a  master  of  arts  in 
counseling  program,  Educational  and  Psychol ooi cal 
Measurement  ,  1979,  39,  429-431. 

Chronbach,  L.  J.,  Essentials  of  Psychological  Testing  , 

4th  ed. ,  New  Yorks  Harper  and  Row  Publishers,  1970. 

Cohen,  J.  it  Cohen,  P.,  AdoI i ed  Mul ti pi e 

Reoressi on/Correl at i on  Analysis  for  the  Behavioral 
Sc i ences  ,  New  York:  Halsted  Press,  1973. 

Covert  R.  W.  it  Chansky,  N.  M.,  The  moderator  effect  of 

undergraduate  grade  point  average  on  the  prediction 
of  success  in  graduate  education.  Educational  and 
Psychol ooi cal  Measurement  ,  1973,  33,  947-950 . 

Cureton,  E.  £.,  Validity,  reliability,  and  baloney, 

Educational  and  Psychol oo i cal  Measurement  ,  1950, 
10,  94-96. 

Educational  Testing  Service,  GRE  1981-1982;  guide  to  the 

use  of  the  graduate  record  examinations  ,  Princeton 
NJ:  Educational  Testing  Service,  1981. 

Educational  Testing  Service,  Graduate  institution  summary 
statistics  report,  Unpublished  computer  listing, 
Princeton  NJs  Educational  Testing  Service,  1981. 

Furst,  E.  J.,  Theoretical  problems  in  the  selection  of 

students  for  professional  schools,  Educational  and 
Psychol  ooi cal  Me asu remen  t  ,  1950,  <10>,  943-952. 

Furst,  E.  J.  it  Roelfs,  P.,  Validation  of  the  graduate 

record  examinations  and  the  miller's  analogies  test 
in  a  doctoral  program  in  education,  Educat i onal 
and  Psychological  Measurement  ,  1979,  <39), 

145-151 . 


106 


Green,  B.  F.,  A  primer  of  testing,  American  Psychologist  , 
1981,  36,  (10),  1001-1011. 

Graduate  Management  Admissions  Council,  Guide  to  the  use  of 
GMAT  scores  1982-1983  ,  Princeton  N.  J.:  Educational 
Testing  Service,  1982. 

Guion,  R.  M.,  Personnel  Testing  ,  New  Yorks  McGraw-Hi 1 1 , 
Inc.,  1965. 

Hecht,  L.  W.  4c  Powers,  D.  E.,  The  predictive  validity  of 
preadmission  measures  in  graduate  management 
education:  three  years  o-f  the  GMAC  validity  study 
serv i ce  ,  Princeton,  N.J.s  Educational  Testing 
Service,  1982. 

Jensen,  R.  E.,  Predicting  scholastic  achievement  o-f 

-first-year  graduate  students,  Educational  and 
Psychol ooi cal  Measurement  ,  1953,  (13),  323-329. 


Knapp,  J.  &  Hamilton,  I.  B.,  The  e-f-fect  o-f  non-standard 

undergraduate  assessment  and  reporting  practices  on 
the  graduate  admissions  process  ,  Princeton,  N.J.s 
Educational  Testing  Service,  1978. 

Lewis,  J.  W.  ,  The  relationship  o-f  selected  variables  to 

achievement  and  persistence  in  a  masters  program  in 
business  education,  Educational  and  Psychological 
Measurement  ,  1960,  20,  (4),  847-851. 

Lin  P.  &  Humphreys,  L.  G.,  Predictions  of  academic 

performance  in  graduate  and  professional  school, 
Applied  Psychological  Measurement  ,  1977,  1,  (2), 
249-257. 

Livingston,  S.  A.  4c  Turner,  N.  J.,  Effectiveness  of  the 
graduate  record  examinations  for  predicting  first 
year  grades:  1980-1981  summary  report  of  the  graduate 
record  examinations  validity  study  service  , 
Princeton,  N.J.:  Educational  Testing  Service,  1982. 

Madaus,  G.  F.  4<  Walsh,  J.  J.,  Departmental  differentials  in 
the  predictive  validity  of  the  graduate  record 
examination  aptitude  tests,  Educational  and 
Psychological  Measurement  ,  1965,  25,  (4),  1105-1110. 


107 


liar*  ton ,  A.  R.,  It  i*  time  to  reconsider  tha  gr  aauate 

racord  examination,  American  Psychol ooist  ,  1971, 
26,  653-655. 

Mahrabian,  A.,  Undergraduate  ability  factors  in 

relationship  to  graduate  performance,  Educat i onal 
and  Psychol oo i cal  Measurement  ,  1969,  29,  409-419. 

Mittman,  A.  &  Lewis,  J.  W.,  Correlates  of  achievement  on 

the  admissions  test  for  graduate  study  in  business, 
Educational  and  Psychological  Measurement  ,  1965, 

25,  <2>,  585-588. 

Nagi ,  J.  L.,  Predictive  validity  of  the  graduate  record 
examination  and  the  miller  analogies  test, 
Educational  and  Psychological  Measurement  ,  1975, 

35,  471-472. 

Nie,  N.H.,  Hull  C.  H. ,  Jenkins  J.  6.,  Ste i nbrenner ,  K. ,  & 
Brent,  D.  H.,  SPSS;  Statistical  Package  for  the 
Social  Sciences  ,  2nd  ed.,  New  York:  McGraw-Hill 
Book  Company,  1975. 

Robertson,  M.  te  Nielsen,  W.,  The  graduate  record 

examination  and  selection  of  graduate  students, 
American  Psychologist  ,  1961,  <10>,  648-650. 

Sawyer,  J.  T.,  Measurement  and  prediction,  clinical  and 

statistical,  Psychological  Bulletin  ,  1966,  66, 
<3>,  178-200. 

Taylor,  H.  C.  it  Russell,  J.  T.,  The  relationship  of 
validity  coefficients  to  the  practical 
effectiveness  of  tests  in  selection:  discussion  and 
tables,  Journal  of  Applied  Psychology  ,  1939,  23, 
565-578. 

Thacker,  A.  J.  &  Williams,  R.  E. ,  The  relationship  of  the 
graduate  record  examination  to  grade  point  average 
and  success  in  graduate  school,  Educational  ana 
Psychological  Measurement  ,  1974,  34,  939-944. 

Thorndike,  R.  L.,  Personnel  Selection:  Test  Measurement 
and  Techn i cues  ,  New  York:  John  Wiley  and  Sons, 
Inc.,  1949. 

Travers,  R.  M. ,  Personnel  selection  and  classification  as  a 
laboratory  science,  Educational  and  Psychol ooi cal 
Measurement  ,  1956,  16,  195-208. 

108 


Travers,  R.  M.  &  Wallace,  W.  L. ,  The  assessment  of  the 
academic  aptitude  of  the  graduate  student, 
Educational  and  Psychol oq i cal  Measurement  ,  1950, 
10,  371-379. 


Traxler,  A.  E.,  Tests  -for  graduate  students, 
Higher  Educat i on  ,  1952,  23,  473-482. 


fournal 


U.S.  Department  of  the  Air  Force.  USAF  Formal  Schools 
Catalog,  AFM  50-5,  Uolume  1,  Washington  DC: 
Government  Printing  Office,  1981. 


Willingham,  W.  W. ,  Predicting  success  in  graduate 

educat i on  ,  in  papers  presented  to  the  graduate 
record  examination  board  research  service  at  the 
12th  annual  meeting  of  the  council  o-f  graduate 
schools,  Princeton  NJ:  Educational  Testing 
Service,  1981 . 


Womer,  F.  B.,  Basic  Concepts  in  Testint 
Houghton  Mifflin  Company,  1968. 


New  YorK : 


109 


B.  RELATED  SOURCES 


Albright,  L.  E. ,  Glennon,  J.  R.,  it  Smith,  W.  J.,  The  uses 
o-f  psycho! ooi cal  tests  in  industry  ,  Cleveland: 
Howard  Allen,  1963. 

Ayers,  J.  B.,  Predicting  quality  point  averages  in 

master's  degree  programs  in  education,  Educat i onal 
and  Psychol oo i cal  Measurement  ,  1971,  31,  491-495. 

Kirnan,  J.  P.  it  Geisinger,  K.  F.,  The  prediction  o-f 

graduate  school  success  in  psychology,  Educat i onal 
and  Psychol ooi cal  Measurement  ,  1981,  41,  815-820. 

Lanholm,  G.  V.,  Review  0*  the  studies  employing  GRE  scores 
in  predicting  success  in  graduate  study  1952-1947  , 
Princeton  NJ:  Educational  Testing  Service,  1972. 

Saunders,  D.  R.,  Moderator  variables  in  prediction, 

Educational  and  Psychol ooi cal  Measurement  ,  1 956 , 
16,  209-222. 

Schmidt,  F.  L.,  Hunter,  J.  E.,  it  Urry,  V.  W.,  Statistical 
power  in  criterion  related  validity  studies, 

Journal  o-f  Applied  Psychology  ,  1976,  61  ,  < 4)  , 
473-485 . 


110 


